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Giapecre &. DNTERODUCTION 


A. Description of the Problem 


The problem addressed in this thesis is the 
non-parametric estimation of population quantiles. Given a 


random variable xX with continuous @istribution function 


F(e), we define the a-guantile s as the solution to the 
a 


equation 
(1) F(s ) =a 
a 


for some given value of a between 0 and 1. We shall assune 


in what follows that Ss is unique, i.e. that we are dealing 
a 


with continuous Or partly continous distributions. 


Sompretely discrete distributions with relatively small 
numbers of atoms present a much simpier estimation problen. 
Quantiles find application, for example, in testing 
statistical hypotheses and in characterizing the extreme 


Values of the distribution of X when a 1S near 0 or 1. 


‘At the outset we note that there is a related problen, 


namely, given a value s, to estimate the quantity p given 
S 


by 
(2) F(s) =p. 
S 


ee cm meee ee ee ee ee 


The value p found in this way will be called a percentile. 
S 


Percentiles may be used, for example, to find the power of a 


Statistical test under a non-null hypothesis. By way of 


_ 10 
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contrast we note that a is the known value in (1) while s is 


the known in (ay: 


The non-parametric estimation of percentiles is 
relatively straightforward; the number of values of the 
random variable less than s in a random sample i ae ae 
X is clearly a binomial random variable with parameters n 

n 


and p so that this number divided by n is an unbiased 
Ss 


estimator of p. 
S 


Me the distribution function F(e} in (1) is completely 


known, finding s becomes a problem Of numerical 
a 


approximation, 1.e. one must evaluate 


(3) 5 Se ai ea 
a 
Note that if the random variable kK has an infinite support 
the slope of F(e) will be very small in one or both taiis of 
the distribution (i.e. as the gquantile level a approaches 0 
or 1); this means that in evaluating (3) for extreme 
quantiles one is likely to encounter serious numerical 
Mmictabilities. if the distribution function Fe; @) is 


known except for a finite vector 9 of unknown parameters we 
may still proceed as in (3) provided we have some eStimate 8 


of the parameters. The resulting parametric estimate of s 
a 


1S given by 
(4) ome t=) { a7) ) . 
a 


The properties of S will depend on both the underlying 
a 


11 
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distribution F(ej and the nature of the estimate 6; the 
sampling variation of §, however, is likely to increase the 


numerical difficulties with extreme yguantiles. 


If nothing 1s known about F(°*), one must resort to 


non-parametric orc distribution-free methods for estimating 
s . Non-parametric quantile estimation is considerably more 
a 


complex than non-parametric percentile estimation. TWO 


solutions have been proposed for this problem (Goodman, 


Lewis and Robbins [14]): the order statistic estimator, S ; 
a 


and a class of stochastic approximation estimators, S . 
a 


The order statistic estimator is obtained by sorting 


the random sample a 6 Mite Cpeemux into order, thus 
n 


determining the order statistics X oe Pee. oS ‘ 
(1) (2) (7) 


Then the estimator is 


(5) 6 =X : 
a (Ta (n+1)} }) 


where {z] denotes the integer part of z. It is known (David 


foo) that © has an asymptotically normal distribution with 
a 


(6) | ETS J =s + O(1/n) 
a a 


and 


2 : 
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A 
me > Varfs = a({il- a + “O(n-< 
(7) (S.] = atis.a eins 
a 
where £(x) = F'(x) is the density function of the random 
Weabiable X. Unfortunately, the time required to order a 


complete sample of size n is proportional to n In nn; thus 
the computational effort for this estimator increases faster 
than the sample size. Furthermore, considerations of finite 
computer memory size limit order statistic estimators to 
Samples of perhaps 10,000 observations (less if several 
distributions must be investigated at once as might be the 
case in a systems simulation study). We discuss some other 
considerations relating to order statistic estimators in 
Chapter III; because partial sorting can be done in time 
proportional to n some improvement is possible, but these 


estimators still suffer from serious shortcomings. 


To overcome these drawbacks, we consider a sequential 
estimation scheme. This may be defined by a saquence of 


functions fh }; our estimates are given recursively by 
n 


(8) S (j+1) =h (s (J), X. > Je tee = le 
a ance 3+ 1 


where 5 (Jj) is the estimator at step j of the procedure. In 
a 


the sequel, we denote this j-th sequential estimator by s , 
J 


Suppressing the dependence on a when this will cause no 


Comeusion. 
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4 


B. Stochastic Approximation Estimators 


The most important class of functions to be used in 
sequential quantile estimation schemes are stochastic 
approximation estimators. There is an extensive literature 


on so-called stochastic approximation methods; these methods 


are intended to find the root x = @ of the regression 
function 
(9) fei N(x) = a, 


where the only information available consists of independent 
observations on the random variable Y(x). We note that this 
is a more general problem than the quantile estimation 
problem considered here. Most WOrkK on Stochastic 


approximation has been concerned with specifying conditions 


under which the seguence of estimators converges 
probabilistically to the correct valve. Many of these 
conditions are trivially satisfied in the quantile 


estimation case; for example, the regression function will 


mmays be bounded Since it 15 a distribution function, F(x). 


The simplest type of stochastic approximation quantile 
estinators are based on the work of Robbins and Monro [ 30}. 


They are defined by the relationship 


(10) S =S -ayY (Ss), Mateos 86 
n+1 n bon nn 


In this formulation {a }. is a sequence of positive constants 
. 
of the form 


14 
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and Y (§ ) is a random variable which aepends only on X and 
n on n 


S$ and which is defined by 


n 


(12) miss eae if X => Ss 
n n 


The initial estimate ot and the parameter A may be chosen 


fieonctraraily or at random. 


The procedure given by (10) is called a Robbins-Monro 


(RN) process; under suitable conditions (which are satisfied 
Pye !O0)—(12) as long as oe Goer Slit | and  Dvoret zlay 


{7} have shown 


(13) S$ -->s almost surely (a-.S.), 


(14) dam EE (Ss »— s j)2)} = 0. 
n- > n a 


Furthermore, Sacks {33] has shown that if F(x) has a 


continuous derivative f(x) at s then 


L 
15 S --> N S a 
>) n ( a’ AA (2 


as long as 0 < A < 2f£(s ). The asymptotic variance is 
a 


Minimized by taking A = f(s )3; this results in the same 
a 


15 : 
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Meymptotic normal distribution for s§ as for the order 
se Ld a AN 
Biatr Stic estimator, s . 


a 


C. Improving the RM Estimators 


An intuitive discussion of the operation of the RM 
process (10) will serve to point out ways in which the 


resulting guantile estimators can be improved. First, we 


note that the sequence {S } 1S a Markov process, although a 
n 


non-homogeneous) one. Moreover, as long as A is fixed, § 
n 


n 
may take on one of only 2 distinct values at stage n. This 


1S because Y is a discrete random variable: it increases 


n t 


the estimate value ("step up") when the latest observation 
is larger than the current estimate and decreases the value 
("step down") when the observation is smaller. 
The actual magnitude of the step is governed by the 
gain seguence {a }. ‘The factor 1/n in (11) is necessary so 
n 
that successive steps become smaller, thus allowing the 
: o 8) 
estimator to converge; however, since 2 (1fn) = © the 
nN = 
seguence of estimators can reach any quantile value s 
a 
Starting from an arbitrary initial value a Note however 
that if S is still far from s for even moderately large n, 


nN a 
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a prohibitive number of steps may be needed to obtain a 


reasonable estimate. 


The first improvement to the basic RM process was 
suggested by kKesten [18]. To cut down the number of Steps 
required to converge to the true value after the difference 


Ss -s becomes large, the divisor n in (11) is modified so 
n a 


that it is increased only when the current step direction 


differs from the step taken at the previous stage. This 
suggests that we have "straddled" the true quantile value. 
Although the stochastic approximation estimator obtained in 
this way has the same asymptotic distribution as the &M 
estimator (Davis [6]), its convergence properties in small 
samples seem to be superior (Cochran and Davis [4]; Davis 
ae) The Kesten procedure does have the disadvantage, 
however, that it often fails to reduce the step size even 


when §s is close to s . The optimum procedure is probably 
n a 


to keep the step size constant until S is "close" to s_ and 
n a 
Meme tO Cabmy out the usmal RM procedure. Such a "delayei" 


process has been studied by Cochran and Davis {4} and Davis 


[6]. 


A related difficuity with the basic RM process 1S that 
1t does not work well at all for the estimation of even 
moderately extreme quantiles (a < 0.25 or a > 0.75). This 
problem was first noted by Wetherill (36]; he traced the 
difficulty to the slow rate of increase of the harmonic 


series ean) when k >> 1. 
n= 
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A solution to this problem was developed by Gocdman, 
Lewis and Robbins [14]. Instead of carrying out the 


operation (10) for every sample value X we use only the 
n 


maximum (or minimum for a < 0.5) of some number of 


observations, say v, where v is chosen so that 
(16) a =at =Q0.5 ‘ 


The RM process can then be applied to estimate the 
a'-quantile of the maxima (or minima); this has the sane 
value as the a-quantile of X. The basic idea is to use a 
data transformation to shift the problem to the estimation 
of a population median, for which RM is known to be 
well-behaved. It is unnecessary to go all the way to the 
median; good results are obtained for Ones GS a) <0, 
Convergence rates are apparently much improved bv this 
procedure; the cost, as Goodman, Lewis and Robbins [14] 


Show, 1S an inflation of the asymptotic variance 


(17) Var[(s'] = Var[s ] a (1 - atl 
n n va'(T- a 


moemost cases the inflation is less than 40 &. 


A natural extension of this so-called maximum 
transformation process is to consider a next-to-maximum 
transformation, 1.e. applying the RM process (10} to the 
second largest (or smallest) in a sample of size w where 

w- 1 W - 

(18) wa = (vowel — a’ = 0.5 - 

The appeal of this procedure in dealing with highly skewed 
real world data is that it may give a more robust estimation 


procedure. GQneceweadain, there is an inflation of the 


asymptotic variance 


18 





Non-parametric QJuantile Estimation Through 
Stochastic Approximation 


(19) VAehee=vartps | _ UNG) = a 
n Nn W 


The inflation is somewhat greater in this case than for the 
Maximum transform but it may still be limited to less than 


50 % by the proper choice of w. 


In the remainder of this thesis, a single prime (as in 


at or sS') will denote an estimate or parameter which is 
n 


based on the maximum transform while the double prime (e.g., 


S") will denote a next-to-maximum transformed value. Except 
n 


for equations (17) and (19), a subscript n appended to a 
primed value will indicate the number of steps taken by the 
corresponding stochastic approximation process and not the X 
Sample size, which will be larger. In fact, we will need at 


least nev X. observations to obtain S'; more will be needed 
n 


Mmiemene initial estimate = is chosen at randon. 


For efficient estimation of a set of several quantiles 
we prefer to use v (or w) values for higher quantiles which 
are integral multiples of the values for lower quantiles; 
this greatly simplifies determination of sample maxima and 
Minima. In this research, a set of 19 quantiles has been 
arbitrarily selected; these include the 16 quantiles of 
Goodman, Lewis and Robbins [714] together with the median 
feoeoeo) and the quartiles (a = 0.25, 0.75). fYhe values or 
V and w for each of the transformation schemes together with 
the respective variance inflation factors are shown in Table 
ihe 
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Stochastic Approximation 


Having dealt with the effects of the ifn term in 
gain sequence fa} we now consider the parameter A. 
O(n71) variance implied by (15) will result when A is not 
too large, i.e. when the initial step size is not too snall. 
It is known (Major and Revesz [26]) that the order of 
a V aa vs W al yt 
.001 672 ~4895 1.425 1316 24542 1.476 
-002 BaG 4897 WHS 768 24543 ifs ere 
~005 112 ~ 4296 12338 384 29725 1.608 
.0106 56 ~ 4304 leosG 192 AS as A 1.608 
O20 28 ~4320 A334 96 od S 1.606 
2025 28 5 Re (e437 48 3083 iea23 
2050 14 go23 1.426 24 SoZ 1.420 
«| Oe) 7 ora 1.402 12 -34F0 1.414 
5) 1 ~ 2500 1.000 6 24661 1.474 
900 1 - 5000 1.000 3 ~5006 ig3es 
7790 1 iO 1.000 6 OS 39 1.414 
2900 7 STEVES) 5) e402 2 -6590 1.4174 
2250 14 ~4877 1.426 24 -66 08 1.420 
ao 75 28 ~ 4922 Vet 7 48 OO Wy lee 
200 28 A Sees) e334 96 52.55 1.606 
-990 56 .5696 1.336 102 ee 5266 SOUS 
B29 5 112 ~5704 wa3i3 8 384 24274 1.608 
5 Shspe 336 Pe) (10) 6) 1.425 768 ~5457 1.476 
.9399 672 75 105 Wed 25 1536 25458 1.476 
Table I. Sample sizes, transformed levels and variance 
inflation factors for maximum transformation (v, a' and V°} 
and next-to~maximum transformation (w, at’ and V") stochastic 


approximation quantile estimation designs. 
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convergence may be substantially worse when A 2 2f{s). 
a 
When the optimum value A = £(s ) 15 cnosen, the RM process 
a 
acts like steepest descent approximation with small steps; 


the steps are the same as those for a linear approximation 


to the distribution function through the point (s , a) 
a 


(Fabian [10 }). 


Evidently the initial choice of A has an important 
influence on the efficiency of the basic RM process, but in 
generai the magnitude of the effect cannot be determined 
Since a(S} 1s unknown. In fact, the asymptotic normality 


Memoe stated by (15) cannot even be asserted Since it will 
n | 


not be known whether A < 2£fls). For this reason, we 
a 


= —<—- 2 — A a —_ eee eee ee eee 
eee ee ee eee — eee eee ee ee ee eee ——e i oe eee ee ee ee ee ee ee ee ee ee = oe ee ee ee ee 


== == —_=_-p os eee ed —- —e see ED ee ae ee we ee ce 


Practical application of Stochastic approximation 


quantile estimation then requires that we have both a 


starting value - and an estimate of f(s ). Although there 
a 


1S an improvement over order statistic estimators in both 


Speed and memory, the additional values required in the 
stochastic approximation case introduce a degree of 
complexity. In fact the selection of these two values is 
Semiieate to the feasibility of stochastic approximation 
quantile estimation and is one of the aain problens 


addressed and solved in this thesis. 
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a 


D. Venter's Method and Confidence Intervals 
The first method for simultaneously estimating s and 
a 
f(s ) is due to Venter [37]. Note that although this solves 
a 
the problem of finding a suitable A value we must still 
select an initial estimate ar. this is not nearly as crucial 


Oe as difficult as the choice of A. In Venterts method we 


observe two X values at each stage of the procedure and 


determine 
(20) Les - 2 HED x >s +c 
n 2n-1 n n 
i-4 1f£ X Se 4G 
2nh- 1 ~ n n 
and 
(21) y= - 3 if xX aS = ae 
n 2n n nh 
1- a isi X aS eC. on 
2n n n 


The seguence {c } 1S a sequence of positive constants calied 
n 
the finite difference sequence; it must satisfy 


E 
(22) Cee > Cy E> ee Vertoecwe < 08250 
n 


A sequential estimator of f(s ) is then given by 
a 


ytoe yl 
1 pen 


5 


u 


(23) A 


Ilys 


3) 


s 
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Finally to estimate 5 we apply the basic RM recursion 
a 


relation (10) with 


(24) Youre: Yi!) / 2 .. 
ba n n 


The latest estimate A of f(s ) is used in the gain sequence 
n a 


in the place of the arbitrary value A, i.e. we use the 


random value Caio a EOE ~ TG GO) ined Pract weal: 
application of the method to quantile estimation, we 
accumulate only the sum in (23) thus obtaining see this 
quantity is used directly as the denominator of the gain 
sequence (11). 

The chief practical difficulty encountered in using the 
estimator (23) is that a may become negative, in which case 


the RMN process will take steps in the wrong direction, or 


else A may get too large in which case the O(n-!) variance 
n 
Will be lost. For this reason, Venter uses as an estimate 


of f(s ) in the gain sequence the value h*, where 
a 


n 
S * x 
(25) A =a if A <a 
n 
4 
A Pods oS Db 
b if A o> b 
n 
XK x 
Mite te mt 15 KMOWn a priori thet a << £F(s) < b. As 
a 


3 
long as b is not too large, we have (Venter [37]}: 
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(26) s --> s aweSez 
n a 
(27) A --> £(s ) aessSros, 
n a 
and 
28) s = N { 1 
Ss -- } Ss - 3 ; 
n a Strzysst ! 


Thus, the Venter eStimator has the same asymptotically 


normal distribution as the other stochastic approximation 


estimators we have considered. (Recali that S is based on 
n 


a total X sample of size 2n in this case.) 


The advantage of the Venter procedure is that we no 


longer need an independent initial estimate of f(s ) since 
a 


the procedure converges for any initial value of f(s ) in 
a 


x x 
the interval (a ,b ). We also obtain (asymptotically) the 


minimum possible variance and we have the additional 


estimate A which may be used to determine a confidence 
n 


materval on s. Sielken ({34] and {35]) has investigated 
a 


the application of the Venter process to the estimation of 


confidence intervals and stopping times. 
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* 
The problen of finding the interval (a ,b)}) was soived 


by Fabian {9]; he suggested the use of 


* SI 
Gagieniease Ot L 172, 


29 é 
(29) a - 


2 F 
ty 


CC £Log (i+), 
2 
OC wie es 
1 Z 


From a practical point of view, we may establish the lower 


bound by setting nA to some small positive constant 
n 


whenever the accumulated sum becomes negative. Venter's 


pd 
results also indicate that the upper bound 0b may be 


Meeaerarily large when the density function is analytic in 


some neighborhood of s , so that this does not represent 4 
a 


restriction in many applications. 


E. A New Nethod 


A modification of the basic RM stochastic approximation 
process along the lines of Venter's work 1s the major 
Contes bution of this thesis. The new process 1S 
asymptotically equivalent to the other processes discussed 
Baer oechapter but itS finite sample properties seem to be 
much better. Just as in the case of the Venter process, ve 


obtain an estimate of f(s ) which is plugged recursively 
a 


back into the basic stochastic approximation relation; a 
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different technigue for density estimation is employed, 


however. 


In seeking an estimate of an unknown density function 
at some point one is lead to the work of Rosenblatt [32] and 
Parzen {28] on kernel estimators. A kernel function W({e) is 


a bounded integrable function with 


(30) [aco dx = 1 ° 
An example is the triangular weight function 


(31) W(x) = 1- [x aoe [S| SS 


0 otherwise. 


The empirical density function estimator at the point x is 
then given by 


te. 


’ 1 Nn , Dias ve 
82 ci X = > | ’ 
n 


where {bb } (Called the "bandwidth" seguence) is a seguence 
n 


of positive constants which tends to zero with increasing n; 
for example, 


=173 


(33) b > Ce 


iI 
= 
™~ 
o 


We now define an eStimator B of f(s ) uSing a kernel 
n a 


density estimator: 


(34) B= > 1 a - 3] 
C | ae 
J J 
and establish a new stochastic approximation process which 


uses the RM rectrsicn formula (10) with B replacing A in 
n 
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a 


the gain Sequence (11). 


One advantage of the new density estimator (34) is that 
we are able to take twice aS many steps aS in Venter'ts 
method for the same sample size; this seems to permit faster 
convergence in small samples. Some computational experience 
with the new estimators shows them to be far superior to any 
other known non-Darametric technique fOr Guantile 
estimation. Almost sure convergence and asymptotic 
normality for the new procedure are established in Chapter 
104 


F, Scope of Research 


The goal. of this thesis is to investigate the 
application of the stochastic approximation technigues 
described in this Chapter to the problem of non-parametric 
quantile estimation in the hope of developing a practical 
method which is fairly robust with respect to the underlying 
distribution F(e). The chief disadvantage in using any 
Stochastic approximation estimator - including Venter's 
procedure as well as the basic RM process - seems to be that 


in some cases the estinators are nowhere near Ss , even after 
a 


as many as 26,000 steps. It 1S in this casSe that the kM 
process (10) has the worst convergence rate because reaching 
the immediate neighborhood of the true value may require an 
astronomical number of additional steps. Unless this 
unfortunate tendency can be overcome, stochastic 
approximation estimators cannot be recommended in practical 


applications. 
Encouraging results have been achieved with the new 


estimator proposed here, particularly when it 1S combined 


With the maximum transformation technique and when some care 
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3B 
Le 
Mm 
=) 


Zeeeaken in selecting the starting value, are an 


entire set of quantiles is to be estimated a further 


improvement is possible. Since the quantiles are by 
definition ordered, 41 gross error ina single estimate cah 
often be detected because the erroneous value is usually out 
of order with respect to the other estimates in the set. In 
this case alternate types of estimate can be used to replace 
the erroneous one, thus bypassing the lengthy path that the 
stochastic approximation process requires to reach the true 
quantile value. Assuming that only one or two of the set of 
eStimates is in error, this approach should overcome the 
tendency of the stochastic approximation process to "blow 


mp. 


The thesis is organized as follows: in Chapter II, we 
establish the asymptotic properties of the new estimator and 
Bmews 1t to be equivalent to the Venter process aS nN -~> ». 
Chapter III describes some practical considerations relating 
to quantile estimation in finite samples of data using both 
order statistic and stochastic approximation estimators, 
While Chapter IV describes the results of an extensive 
digital computer simulation undertaken to determine the bias 
properties of the new estimator. Chapter V discusses the 
Simultaneous eStimation of an entire set of population 
quantiles and considers several techniques such as 
James-Stein estimation and isotonic regression to exploit 
the order relationships which are known to exist in such a 
set of estimates. Chapter VI discusses the estimation of 
Pinewron=s Of GQuantiles, in particular the estimation of the 
level of a test based on a given statistic and the 
estimation with the same simulation data of the power of the 
test. The last Chapter summarizes the work and discusses 


possible applications for the methods develop i. 
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In summary, this thesis describes a nethod i oOce 
estimating an entire set of quantiles with their 
corresponding densities for any statistic or other random 
quantity. The method is quite fast and uses a small fixed 
amount of memory; it is robust enough to be used as a basic 


building block in computer simulation programs. 


Meeelimitations Of Research 


In this thesis we deal only with non-parametric 
quantile estimators; substantial improvements are often 
possible if we know enough about the anderlying distribution 
function F(e) to apply maxinun likelihood OE other 
parametric estimates. For example, if F(e*) is the 


exponential distribution then 


(35) S$ = -p(X) ln (1 - a) 
a 


(where fi{Xj] denotes the sample mean) is the maxinum 


likelihood eStimator of s and is therefore asymptotically 
a 


fully efficient. Clearly, 


(36) F(S J} = - p ln (1 =- a) 
a 


= ; 
a 


Bemenat Ss is unbiased; furthermore, 
a 


(37) Var{S ] feetns (¥ = ya) je 
a 


ul 
n 
Ss = 


f N, 
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x 


which is at most 65 %& as large as the asymptotic 
non-parametric variance. AS a approaches. 0 or 1 the 
relative efficiency of the parametric estimator in this case 


becomes much greater. 


This work is also limited to the consid2ration of 
continuous or partly continuous distributions. When the 
random variable X has a completely discrete distribution its 
a-quantile may not exist or may not be unigue; to overcome 
this difficulty we may redefine the a-~quantile as the 


solution of 


(38) Mibea tS). 20a, 
Ss a 


which reduces to (1) in the continuous case. [t is not at 
all clear, however, that the solution to (38) has any 
reasonable interpretation, particularly if X has only a few 


atoms. 


The methods developed here have been investigated using 
only pseudorandom Simulation data and this 1S typical of the 
proposed applications for the techniques. Real world data 
Can certainly be used but the sample sizes required for 
reasonable results from stochaStic approximation quantile 
estimation are so large that only in special cases will 
sufficient observations be available. It seems likely that 
the next-to-maximum transformation will prove more useful in 
dealing with real data than was found to be the case with 
the artificial samples used here since there is usually more 
difficulty with outliers in the former case. AS Gaver and 
Lewis [12] point out the naximum transform wiil intensify 


any problems caused by outliers. 


Gremernaleiimitation of this work is that we consider 


30 





Non-parametric Quantile Estimation Through 
Swognastlc Approximation 


only samples with sequential independent observations; this 
Will clearly not be the case for much real world data or for 
many kinds of simulation studies. We may be able to apply 
our methods in the Simulation case by uSing the regenerative 
techniques of Iglehart £16} but the general problem of 
dependent observations is much more complex and is not 


considered further here. 
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meepeLinitions and Preliminaries 


Vomwtshe LOreStiMate the solution x = s to 
a 


P(x) = a, ORG lesan tig 


where F(e) is the distribution function of the randon 


Variable X. We assume: 


Bali's F (x) has a derivative £5) which is 


continuous in some neighborhood of s5s With 
a 


Pc jee e> UO. 
a 


BZ F"(x) exists and is bounded in some 


neighborhood of s . 
a 


Note that (F1) is sufficient for s to exist and be unique. 
a 
A seguential estimation scheme iS used with S_ the 


Nn 


estimate of s at step n. The initial estimate =e 1s chosen 
a 


arbitrarily (or at random with ee <o) and we apply the 


recursion 


wi 
tH 
YN 
{ 
fad) 
es] 
“« 


(1) 


n+ 1 n noon 


BZ 
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thy 


where Y 1s given by 
n 


(2) Waewee— ae Saf X > Ss ; 


I 
aod 
\ 
av) 
a 
rh 
>S 
IA 
wt 

8 


In (2) X 1s a random variable with distribution F(e«) which 
n 


| id ook jee 


1s assumed independent of ie < ; 
| G Boose 


The gain sequence {a } 1S given by 
n 


(3) a =t1/V/no0d=—, 
n n 


where d is essentially a "bounded" kernel density estimator 
n 


(see Rosenblatt [32] or Parzen [ 28 }): 


= 
(4) tice scan , “Win { B, C log(ntl) } 4, 
n 1 n 2 
with 0 < L < 1/4 and O < - < ose The estimator B is 
n 
defined by 
n 
2) a te ae, 
n n j=7 jj 
ak 
6 = ee j 
(6) Ls Se 


z 5 
Where {b } 1S a bandwidth sequence of positive constants 
n 


satisfying 


(7) b = O(n i ; re ail / 2 
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The function W(e) 1s called the kernel function; it is 


assumed to satisfy 


Wt. WeCa yD oO; = <0} SO 6 
W2. Sup W(u) = K <@. 
-~ocu<cc 
co 
W3. i W(u) du = 1. 
-o 
w4. lim fuW(u){ = 0. 
DUT | Seat e oXe- 


Note that W(e) ais a probability density under these 


assumptions. 
In what follows, we show first that S -> s almost 
n a 
Surely (abbreviated a.s.) and that d -> 8 a.s.; then, uSing 
n 
a theorem of Fabian [9], we develop the asymptotic 


Mmestercapution of Ss. Throughout, {Q,S ,P} will be a 
n 


probability space and B= NE BIS ooo. Fst a sequence 
n n- 


of o-fields (i.e., the smallest O-field with respect to 
which the indicated variables are measurable). 


We begin by rewriting the basic relation (1) in the 


form 
(8) Ss =s ~ T +0, 
nt 1 n n n 


in which we define 
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Le ssaetr(s ) = al, 

n n n 
U =-a Za, 

n joamtel 

(9) He 1 CO |e 
ni n n n 
= yo —- F (Sty + a ; 
n gt 
We note that |Z | < 1. 


n 


Since we will deal with sequences of the form (9), we 
begin by stating two lemmas relating to sequences of this 


type. Proofs may be found in Loeve [ 24}. 


Lemma 1 (Loeve) Let {V } be a sequence of random variables 
n 
ie 6) 00 
With Pee then 1f > BLV [Vi pese7V J] converges 
= J n n=1 n 1 n-1 
OO 
aes. , 2, V converges a.s. to a random variable. 
n= n 
o 
6) 
Lemma 2 (Loeve) If c(n) ->* and > ony Var ive |< omenen 
n=1 c(nj 4 n 
1 > om ECV Vv V > 0 
—- etal a ate Ase Se 
e(ny ey Bee poy 


B. Convergence of § 
n 


The proofs in this Section follow the lines of Blunm's 
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a 


moukK ([ 2 }. In fact, the convergence of S_ follows at once 
n 


from the bounds indicated by (4) (Fabian {9]j) if we are 


wrering to adopt a slightly different definition 9£ B . Now 
n 


we deal with the relation (8) and show 


1° @) 
Lemma 3 2, UY converges a.s. to a random variable. 
n= n 
HeOOL > 
Clearly, 
Ves =) Bae Ze | 
| n aa 
<n E[ 22 / d@] 
ne n n 
a2 
See (nl eA) 7 
CO 
Ssemthat > Var({U J} <o. 
n=1 n 
Now XX aS 2ndependent of {S ; K ,...,%X } and since 
n 1 1 n=l 
these random variables uniguely determine 4d we have 


n-1 


meee 8€6f © 3 = 0 a.s. hus, 
ie ni 1 n 


( 


ious a leche. ti Be 2. /a { Bj 
non n n-7 n n-1 n 


a 
nN Nn 


= Ef {1/7 (n-1)d lina 2 | 5 | 
é n- 1 n on n 


Bitetn 1a Gay 2 9/70 <1 Hae eae ae 
n-1 noe 


1 
n(n-T) n-1 0A n 


Now we use the definition (4) of d to set an upper bound: 
n 
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= | n Nn 


e EL Ind - (n-1)d 4 121 | B J 
n n-1 n n 
Ge amie 2 
Son (n- 1) Cale E (eel na = (m=) a i; | Bj 
1 n n-1 n 
where we have used the fact that |Z | $ 1. The relationship 
n 


DeLcoWecreole= ax c,d ] | S Maxf Ja = cl, |b -—- dj j 
and the definition (4) then imply that 


ioe Fr 1 ae © 


End) — (n— t)d i < Max { {C n = 1G fea) iL: 
n n= 1 1 
Poe (GS 2) ly 
n na 
leans Logarta = ©. (n-1) log nim 
2 2 
al =a 
Now the first term here approaches ay aed i 0 (1 ) as 
nh -->o sd in this case we have 
c= eesti 2 3h Sls 1 
[E[U | BU <n (n-1) C [C (1-L)n + O(n ja 
n n 1 1 
Ib 2 
= O(n ) a.S. 


For the last term we get 


@en Log(n*+1) - € (n-1)1log n Celogintd) e+ C in-ijtoqi 
2 2 2 2 n 


IA 


C log (n+1 
5 g ( ) 


3:1 
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= O(n HOG en). 


Finally we consider 


free (n-1) 8 | W 
n n-1 n 


K / b 
n 


lA 


H 


on), 


in view of (W2) and (7). Thus we conclude for this case 
eid t 


L~-1 L-1 -2 
mepoe {6 i] < av (n-1) Cc K / b 
n n 1 n 


Zit Gg — 2 
O(n lee 


00 ee 
We thus have that > |E{u | B Jf converges almost surely in 
| n n 


all three cases because of the definitions of L (4) and g 


(7). An application of Lemma 1 then completes the proof. 


Lemma 4 (Blum) S converges a.S. to a random variable. 
n 


Proof: 


iterating (8) back to a yields 


n n 
S =(S0 = eet ee 
n+1 1 521 5) 52 in 
so that 
n n 
(10) Ss Teele -eSsoet > 0 GOnverges dis., 
nt 1 j=1 j 1 AS ae 


in view of Lemma 3. Next we show 
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(11) Pr{ lin S =m} = Q. 
RES 


Suppose, for example, there exists a sample seguence {S } 
n 


ween lim S =co 3; then S S$ s for only finitely many n so 
i Bates n n a 


that T =a[{F(s ) - a] > 0 when nis large enough. Thus 
n n n 


Pita. own echyOGCurs With prorvability 
J 


p 
bt 
= 
ey 
nl 
+ 
whys 


n-> 0 n+1 5 


zero by (10). This establishes (11) and we similarly show 


(12) Pr{ lin S$ = ~~} = Q. 
no n 


Now suppose the lemma is false; then there must exist 


sample sequences for which 


ltys 


Ss + 


; I converges to a finite number 
n+ 5 


ace 
(73) 


lim inf S ¢ lim sup s . 
Na n n= n 


Letting {S } be such a seguence, we assume that lim sup S > 
n n 


S (a Similar argument handles the case lim sup S £5 Lor 
a n a 


then lim inf S§ <s by (13) ). We then choose numbers c 
n a 


paemaesuch that c > s and lim inf s <c<d< lim sup s. 
a n n 


In view of (5)-(7), a -> 0; and since § ae 
n n+] J=1 4 
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.Y 


converges, we may choose N so that N $< n <€ m implies 


(14) 


" 
{ 
inl 
+> 
re 
iA 


Seeo os 
m n jHn j Z 


Mmewmwe select m and n with N <n < m such that 


(15) Sed ¢ 
Ca Sams a FOL Nas } Sm 


We may clearly do this. Thus, 


m-1 
(16) Smeeece dec) PTs di =te-7 , 
m n iz jan j Z n 
peer =~ af{?F(s ) -~aj]> 0 fors 2¢%>s. Now if S > 
7 J J J a nh 
S we obtain 
a 
So BS 1G) ae 
1 n 2 
mmmmecontradiction of (15) which implies s - S$ > a-c. If 
m n 
S < s we have 
n a 
Seca eew ris 9) ) Sa »S di- Cc 
n n Z 
meomecri):; thus {16) becomes S - S £ d- Cc, which again 
m n 
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4 


Bemcradicts (15). This means no sequence (S } can satisfy 
n 


(13), thus establishing the lemma in view of (11) and (12). 


Oo 
Theorem 1 (Blum) S --> S a.S. 
n a 
Proof: 
We suppose Prf{ as S = S} = 1 as guaranteed by Lemma 4 and 
n=27 co fn 


we also suppose that Pr{S # s } > 0. Now we choose c and d 
a 


mens < C < d <a and Prfc < § < d} > Q. (Alternatively 
a 


we take -%o <cc<cd<s .) Then for every sample sequence 


Bamepeor which Wms = S, ¢c < S<d, we have c¢<s <a@ 
nh em Th n 


mor almost all n. Lemma 3 and Lemma 4 show that 


n n 
(17) ete - eee Ph (Ss) - a} COnverges|, 
Jota sg J 
Memever, F(S ) - a > F(c) - a > 0 for almost ail j so.(17) 
J 


must diverge because a 2 {j Se Hogqgiti)}>*:. this fellows 
J 


from the definitions (3) and (4&4) and the fact that o > cE 


Thus, 


n n 
On ee oe Us. Heeled(g+ j=?) Ol tog (togun) 
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This contradiction establishes the theoren. 


C. Convergence of d 
n 


We begin by proving three preliminary Lemmas. 


Lemma 5 Let {t (x)} be a sequence of meaSurable functions 
n 

uniformiy continuous for every n 2 N in some neighborhood of 

the point X € R with 


(18) ieem. tanomyee= t (Xx) 
n->0 n 
and {X } a sequence of random variables with 
n 


(19} Xo aod. Sel, 
n 


where X € R is a constant. Then 


(20) t (X ) --> t(X) a.s. 
n n 


moor s 
The convergence (18) implies that for each > 0, whenever 


ne 2 Nee ) we have 
121) tomes) etx) t < 7 7/2. 
n 


miewunitorm continuity of t (xX) for n 2 N likewise implies 
n 
that given 77> 0 there exists an e > 0 depending only on 7 


such that 
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Camm y= Al < ee ==> ft (X (wW)) - t (X){ < 7/72, 
n hn on n 


for each w€ %. Combining (21) and (22) yields 


(23) Poe OOM at Soe ==> 1t (X (@)) -— t(xyi < 7. 
n n on i 


Now by Egoroff's Theorem (19) implies that for each 65 > 0 


there exists a set As CaS wait i P (Ay) > 1-6 #such that 


X (wm) converges uniformiy in wm for every w in Ag 
n 
Evidently then if n 2 alee 


Gee Aa ==> 1X (a) =) Xj] < e. 
O n 


Now since e in (23) depends only on 7 , whenever n 2 N(n) = 


max a) Ve ee) 3 we have 


Oe ee kOe etal ST, 
6 1 
which means that t (X ) -> t(X) uniformly on Age Since 6 
n oon 
is Poatviakye. unis Means that  t (X ) <-> +t(X) almost 
n on 


uniformly which implies (20) because of the equivalence of 


almost sure and almost uniform convergence (see Lukacs 


ie }) . 


Lemma 6 Let {X } be a sequence of bounded random variables 
n 
with 


(24) Wear 0 a<Ss; 
n 
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Then S --> 0 as-S. 
n 


Proof: 


Because of (24), given e > 0 there exists a set A cS with 
2 


P(A ) = 1 Such that 
e 
WE A ==> [X (@W){ < e/2 
= n 


tEOjmeall n 2 N(e,@). Now for t > 0, 








25 : 1S x | 
= ce) 
(25) iS (@) | = 
cate | Ax 
< 1) 
N+t q= 1-9 
: | Het “ | 
+ @ 
NFE jaf 5 
Now we take 
(26) C(N,@) = a Pyort@) ji |<) Ss 
ns n 


this follows from the hypothesis that {X } is bounded, but 
n 

the lemma will hold for any sequence satisfying (26). Now 

(25) becomes 


(Sends MN C(t,o) + “t. e 
N+t iy 


whenever we choose t 2 T(e,@w). Thus, 


W€ A ==> ([S (w)| <e 
e m 


Tar 
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a 


for all m > M(e,w) = N + T. Since P(A ) = 1, we conclude 
2 


mirat S <-> O a.S. 
n 


Lemma 7 Under assumptions (F1) and (W1) through (W4) the 


function 
oO 
27) t (x) = 1_ fl ee | AF (y) 
n b a 9) : : 
n se n 
meeuniformiy continuous in some neighborhood of x¥ = s_ for 


every n 2 N. 


Emo OL : 
Suppose in accordance with (F1) that the density £ (x) exists 


eu@ee iS continuous for x «I= fs -A, s +A] for sone 
a a 


A > 0. Following Parzen [28] we may rewrite (27) in the 


form 
t (x) - £(x) = [ ey eee = CC Daan (Y D> h) dy 
n [vi <6 eta n 
+ ye 1 Wi x - dF 
n n 
Steere (OX) f b-~* W({yb-*) dy 
lyi>d on n 

where x € I and 6 is chosen such that 0 < 6<A. thus7 when 
ae 1, | 


It (x) -f(x) 1 4 su Petx=v) =f ox) | t Hu) du 
n' riED pie lytsé 
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choose a 


e/f3. 


m2 N 


than 
term 
thus 


when 


Theorem 2 d 
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dF (x-y) 


f{ ) 
"Sips o CB ryt 


+ £(x) f b-1 W(yb-1) a 
ly!>d on : n : 


re 76 $ (Gm aie & 
i i? (x) | TES [it (x-y) (x) | 
+ 3 foe, W (Zz) ] ou 


nN 


wat (xX) f WiZineaz .. 
ge a 


Dyce Con tan witty Or of (5) 
Oe mOesuen that the first term will be Jess 
Having chosen 6 we may then select N such that 
(W4) that the second term will also be 
3. (W3) 
Will also be less than e/3 when n is large enough. 
have that 


implies 


finally, allows vs to conclude that the 


sup me nex) = Eb (Xx) 15< eC 
xe -I n 


fete) eee tat x)) LS Unifernly continuous on I. 
n 


-~->f8 Ge De 


PEOOL: 


In view of the bounds (4) 


Pueotiinr1ces to Show that 


(28) 5 
n 


--> £ ae Se 


We first note that 
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way) = 1 wf fo] 
n n 


has a bounded variance whose bound is independent of y: 


be 
erence) < 1 fe lez] dF (u) 
n 


n 


< K2 ferro = 62 . 
bz b2 


n n 


which follows from (W2). Thus, 


0 ora) 
mote Variw } S Ke 3 ‘(n b )-- 
n=1 n@ n n= 1 n 
which is finite by (7). Lemma 2 with c = n then implies 
n 
n 
(29) 1 5 (w. - E[w.{8 yj} --> 0 ass. 
rs ea Jone 
Now 
n 
(30) Deets”, 
n nj=1 j 
n n 
Sire eee Oye tt 2) Ev ol ee) 
nh Dat J Pa n 3=1 J 3 
E(w {8B Je Wf 3 “4 ls 


: ia aF (y) 
J J 


te(si)  ad.S., 
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with t.(*) given by (27). Now Parzen [28] has shown that 
J 


(W1) - (W4) and (F1) imply 


Pine (se) sat (Ss ) = 2 = 
> no va a 


h- 7 © 
Clearly ae 1S measurable and alse) 1S continuous for 
every n greater than some fixed N by Lemma 7 so we may apply 
Lemma 5 to assert 


Bevel | s2> p ~a.S. 
J J 


Now by (W2), IE{w 1B JI S$ K/S/b. < so that (26) is 
3 J 3 

feretied for X = E{w |B .} - g and an application of 
J J J 

Lemma 6 and (29) to the right-hand side of (30) establishes 


(28). 


D. Asymptotic Normality 
We first state a Lemma due to Burkholder (see [3] for a 
proof) and then use it to obtain a result on the convergence 


of S in the guadratic mean. 


oe ue sm se 


Lemma 8 (Burkholder) Let {X } be a non-negative sequence 
n 
of real numbers and {q }, [r } real number sequences with 
n n 
mieine dg = g > p> 0 and lim sup r =r > 0 such that for 
n n 


every n 2 N 
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: +] 
X S(t = 4h) howe ct nr n’ ° 
n+1 n n n 
Then 
oe Se oo | 2 2O.(1 aa). 
n q- 
oO 
Lemma 9 Ef (5 - s )2] = O(n-). 
n a 
Proof: 
In what follows we write s* = S§ - Ss. Expanding (8), we 
n n a 
obtain 
(31) S* =s* - a f[F(s) -atdZ ]. 
n+1 n n n n 


If we expand F(s ) ina Taylor's series about s we then get 
n a 


(32) F(S ) - a= P{s) + (8 ~ 58S) f(s } 
n a n a a 
+e (SS eel 
n a 
= Bs*¥ + §(S¥*) , 
n n 
where §(x) = O(x) as x-> 0 because of (F2). We write 5 


n 
Lor 6 (S*) in wwhat follows. Substituting (32) into (31), 
n 


squaring and simplifying yields 


Z 2 
(33) s* = (1- 2a &) s® - 2a 8 s* (Z + § ) 
nt n n nN i n n 


ug 
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+ a2 (Bs* + § +2 )2 
n n n n 


In order to apply Lemma 8 to (33) we define 


q =2na 6é (1+ Z /S* + § /S*) 

n n n on n on 
Soe 1 +O ()) ) (des. 

np 

=> 2 4a iS 5 


where aan 


ae ease = Dy ene deLEnitLon Of -0(*).. vie 


also take 


ry 
i 


nea2 (Bs* + § + Z )2 
n n n n 


n@a2 ([F(Ss ) - a+ 2Z j2 
n n n 
n 


==> a(te= da) / B 4 a.s. 


We then rewrite (33) as 


ry 


Z g 2 
Ss * = (1 - fh) S * + 
n+1 n n 


ren 
NS 


an application of Lemma 8 with c=2, p=1 then shows that s* 
n 


= O(n-!) a.s. and so we conclude 


Ee eS ee (Tl). 
n a 


We now state a specialization of a theorem due to 
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a 


Fabian [9] to show the asymptotic normality of s. The 
n 


notation ae stands for the indicator function of the set 


wey, 1.e. 


I (a=) 4 x € {t} 
ie} 


0 x £€ {t} 


Lemma 10 (Fabian) Let 8B be a non-decreasing sequence of 


n 
o-fields, 8 <«S, Leta, B 4 ei and T he 
n n n-1 n-1 n-1 n-1 
B -measurable random variables with 
n 
hae — > oe ds Sa, 
n 
B --> b AeSey 
n 
T --> t aes. or EB[(T - t)2] --> 0, 
n n 


with a,b,t € R. V satisfies 
n 


Div |oule=s 0 dase, 
n n 


C> E[V2 - o2| Bj--> 0 a4.s., 
n n 


EL I V2), 5 =a Sacer, 
1 ! {Ve 2 ee wt Ae 
n 


ns 


(34) lege 
no J 


for every e > 0, while U is defined by 


n 
A = 372 
U = [ 1 = op Go 4b Oe ee 
n+1 n n nh nn n 
Then 
eZ L 
n CMe Ne cca: = 472) 400 eDe jo. 
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Theorem 3 5 is asymptotically normal with nean s_ and 
n a 


variance a(i-a)/nB?. 

Proof: 

To apply Fabian's theorem we use the Taylor's series 

expansion of F(S ); putting (32) into (8) and simplifying we 
n 


get 


Now we can take 


RK = $£B/d --> 1 a.S., 
n n 
Baie Ned > G6 -*) als. 
n n 
3/2 
T =n a 6 
n hey eth 


EET 2} =n BEF 6 2 / GZ] 
Nn n n 
--> 0, 


since 62 = o(n ) by Lemma 9. Furthermore, we have 
n 


Piegeioo. | = 0 a.s., 
n nN 


I 


Pecos (Ss i= Fs ) J 
n n n n 
meer Ci A hie, Say 


While the convergence of (34) follows at once from the fact 


that Z is bounded. Thus we conclude from Lemma 10 
n 


a2 





Non-parametric Quantile Estimation Through 
Stochastic Approximation 


L 
(35) n (Sans )o=-7> N f 0, a({i-a) 8 -2 }. 


To show thet 4d also has an asymptotically normal 
n 


distribution we need a Central Limit Theorem for the sum of 
a sequence of dependent summands. For a proof, see Loeve 


feo, p. 3/7, Theorem C. 


Lemna 11 (Loeve) Let {X } be a sequence of random 


— n 
n 
Mapiadoles with S =1 > xX. ff 
Vd a 
(i) Wee Xs ae «gs ]] Su sats + 
n 1 n-1 a 
n 
(ii) Var[S J = 1, & ELx2]} = o2 < ow, 
n n= j=1 5 n 
ni 
(ceri) a 2 E[ [EB {xX21% pmgkee ho amt ke} f | =r? 0, 
ne j=1 yo et a heal 5 
and (iv) fOEsecachye > 0, 
n 


E[ 


1 i Xe Pm am Or 
n@ 521 (1x, 12<) | eh 


then S has an asymptotically normal distribution with mean 


n 
O and variance o?. 

; n 

o 
Theorem 4 d has an asymptotically normal distribution 
n 
: : i= | 

with mean gf and variance O(n ). 


Proof: 
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From (30) we have 

n 

Cees COWIE oot t 
ae) ase 


where the second term converges a.S. to 86. In order to 


apply Lemma 11 to the first term we define 


Vv = wW = Ei w B s 
k k K 
eroarly, 
E[ v B = 0. 
( | ] 
a6 E( v2] B = a w - Efw B 2 B 
(36) BONS me! et Ella 2) J 


21 B = t (s B 
ae KS 2 By ore ad 
RPt2 (3s By, 
+ Ef Sea 2 


where we have used the fact that 


B Ss o% 
Bbw al ee aes 


from Theorem 2. Also 


E[ w-1B 
! as K b2 


tt 
=) 
=, 
8 
= 
N 
a | 
in 
i 
i «< 
es | 
fu 
= 
-«~ 
hes 


Simplifying (36) then yields 


it 


2 B Ss - t2(s 
ae k } ay sre k oe 
= §@ (S ). 


Nemwerarzen { 28) has shown that 
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lin 89 (s ) = b-! £(s ) frro du; 
n->o nn a n a 
we note that W2(u) du is finite by (W2) and (W3) but that 
the limit diverges because of the definition of b (7). The 
n 


proof of Lemma 7 may be extended at once to Show that 98 (s ) 
fie a 


is continuous (at least for all n greater than sone fixed N) 


so an application of Lemma 5 shows that 


Ef v2! B --> bri £ We d aS 
[v21B J : caf (u) du aes 


Now we conclude 
E[v2} = EL E[v2|6 
L - [ Ef k = ] 


--> b-! f(s ) few du, 
k a 
so that 


n 
Bev eae =— > ot (S>) foro pel ees 
J a at J 


WMS 


nz 5=1 


; : oS 
The summation clearly converges; if infact b = bn , 
n 


: ag. < je an application of Euler's summation formula shows 


fats W2(u) du on 
ase he ey 
n bne j=1 


12 (u) d —1 -2 
ome a, ont 
b (Tt gy 


eee 7 
——— 
a? 





Chapter EIT. FINITE SAMPLE CONSIDERATIONS 


In this Chapter we descrihe some methodological 
considerations in quantile estimation uSing both order 
memtastic and stochastic approximation estimators. The 
Smohasis throughout 1S on practical application of the 
technigues in finite samples of data rather than on the 


asymptotic theory of the first two Chapters. 


It has long been known that the finite sample behavior 
of the basic stochastic approximation quantile estimators is 
seriously flawed from a practical viewpoint (Cochran and 
Davis (4); Wetherill [36]; and Davis [6]). Siyces tic 
problem of finite Sample analysis of stochastic 
approximation estimators 1s analytically intractable we rely 
for the most part on digital simulation to examine the 
finite sample properties of our new estimator; it will be 


seen that most of the drawbacks have been overcome. 


The asymptotic distributions asserted by (1.6) and 
(1.7) for order statistic estimators and by (1.15), (1228) 
and C2550) for the various stochastic approximation 
estimators may fail to describe the actuai distribution of 
the estimator for some given n either because this actual 
distribution is markedly non-Gaussian in shape or because 
its mean and Variance deviate appreciably from the 
Pmeeretical values. In this Chapter we are for the most 
baec concerned swith the first difficulty, leaving the 
discussion of estimator bias and mean squared error for 
Chapter IV. 
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mame raer Statistic Estimators 


feeekasic considerations 


Meompoimmeed. Out in Chapter if, the order statistic 


e e rant se e e 
mamerle estimator s for the a~quantile is given by 
n 


With wu = f{ a(nt1) J. Unlike the stochastic approximation 
case, here we need not rely on the asymptotic normality of 
8 fo jomtain a confidence interval on Ss :; non-parametric 
n a 


confidence intervals nay be constructed fron the 


relationship (David [5}) 


Vl 1 = 
1 208 X £<s S$ X = ny. 2 3(i-2 : 
hs ees a 10). a me 
This formula may be evaluated using a tabie of the 


incomplete Beta function (see, for example, Kendall and 
Stuart [17]}; however, direct use of the relation (1) is 
impractical and unnecessary for choosing the values of t and 
v for large sample sizes n since suitable values for given n 
and a may be obtained by using the normal approximation to 
the binomial random variable. For a 100 p # confidence 


interval we have 


t = a(nt+tt) - fa(T-ayn u 


2 
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and 


v = a(nt+1) + Ya(T=ayn u , 
P 


where u is the upper 1 - - PIcignreteance DoOINntlOr Ya unit 
P 


normal variate. To obtain a conservative interval, we round 


t down and v up to the nearest integer. 


The quantile estimation problem may then ba reduced to 


finding three order statistics X X cpr etemds 


X and 
t) (u) (v) 


does not require that the entire X sample be sorted nor need 


we save the entire sample. In fact, just a bit more than 
an sample values (or (1-a)n values for a > 0.5) must be 
stored. The three order statistics may then be found py 
applying Floyd and Rivest's SELECT algorithm [11] which 
requires an average amount of work proportional to n. This 
then represents a substantial computational advantage over 
the naive method of sorting the entire sample, as well as 


decreasing the memory requirements somewhat. 


There remain, however, several serious shortcomings to 
the order statistic method. First, if more than just a 
Single quantile must be estimated the memory requirements 
Will probably increase drastically and the amount of work 
also increases quickly. For the simultaneous estimation of 
the 19 quantiles of Table I it will still be necessary to 
store the entire sample and the work needed to find the 57 
order statistics of interest will be comparable to the 


effort required to sort the sample as a whole. 


This may be shown to be the case by considering that 
the number of comparisons between observation values 1S a 
rough measure of the total amount of work reacuired to sort a 


sample (or Porrind the order statistics of interest). The 
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SELECT algorithno [11] requires an average of about 


mer) + Din(a, 1-a) ] comparisons to find “ S Ope at se eranciacnig 
u 


the median requires about n/2 comparisons. Once the median 


is found, the upper sample quartile (i.e., 0.75 order 
Statistic) must be found in a set of data which is only half 
aS large as the original sample (this is a result of the 
sorting method employed); this requires n/8 comparisons, on 
the average. Proceeding in this manner, we find that 
determining ail 57 order. statistics will take about 15 n 
comparisons; a complete sort, on the other hand uses about 2 
n in n comparisons (see Knuth [19]). The advantage will 
then be with the complete sort for values of n less’ than 
1500 and the amount of work will be about the same for 
ieee <n < 10,000. 


Since order statistic estimation is not basically a 
sequential scheme, a second shortcoming of order statistic 
estimation arises when it is found that a larger sample is 
needed, perhaps because the eStimates in a Sample of size n 
are not precise enough or perhaps because more data become 
available. If one wishes to take advantage of the savings 
possible in storing only an of the observations one must 
fix the value of n in advance. When a larger sample is to 
be investigated it will not in general be possible to find 
the exact order statistic of interest in the pooled sample 
unless all of the discarded data from the original sample 
can also be reviewed. Furthermore, the operation of the 
SELECT algorithm will still reguire an amount of time 


proportional to the new (larger) sample size. 
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2. Decreasing the storage - Payne's method 


The most serious difficulty with order Statistic 
estimators is the inescapable linear growth in storage 
requirements with sample size. For this reason, a technigue 


due to Payne [29] may be considered. A value m < n is first 


chosen; Payne shows that @ may be proportional to Yn. An 


array of size m is set aside and filled with the first o 
observations on X. The array is sorted and then, using (i), 


meeeconrcidcence interval on  s5s TSareobtatined. Observations 
a 


mimomde the confidence interval are discarded and new 


observations are obtained to fitle the “diay. Any 
observation which does not fall within the confidence 
interval is counted toward the total number of observations 
Ditets not put into the array. When the array iS again 
filled it is sorted in place and a new, narrower confidence 
interval is chosen. (The new interval iS narrower in the 
sense that it is shorter than the earlier one, but it will 
have the same probability mass from (1) Since it 1S based on 
aeeelarger sample. Note that it will in general have nore 
Observations than the earlier interval.) The procedure is 


repeated until all the observations have been examined. 


The main drawback to Payne's method is that if the 
initial confidence interval is not wide enough the technique 
may fail to cover the reguired order statistic when the 
entire sample has been examined. For this reason, the 
technique should probably be employed with extremely 
conservative confidence intervals - say 4-5 standard 


deviations - with the actual desired confidence interval 
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chosen at the final step. For example, to determine the 
median of a sample of 106 observations with very low 
probability of failure a total storage requirement of sone 


8000 observations should be ample. 


The estimation of several quantiles by this so-called 
partial sorting method appears to involve a fairly complex 
Peegorithm, but the method Should be useful for a snall 
humber of quantiles (say two or three) in fairly large 
Samples of data. Although the method still requires memory 
which increases with sample. size, the presence of nore 
observations can often be handled by simply decreasing the 


coverage of the last confidence interval. 


3. Approximate order statistics - Averaging 


Another possible application of the order statistic 
method is to consider the KX sample in sections of some fixed 
Size, say 100 observations. We can then choose Y. = X 

1 (100a) 
in section i. The final estimate could then be the average 


G@eeptne Y 'S OF We May Once again sort the Y sample and 
i 


choose an appropriate order statistic as an estimate. If 


the second technique is adopted one may obtain yet another 
level of sections of the Y order statistics and then choose 
s we call this a "nested" method. Both the 


Be 4 
i (100a') 


average and nested methods can be thought of as approximate 


order statistic methods since they do not find the actual 
order statistic in the entire sample but rather a value 


Glose to it. 


The chief drawback to the averaging method 1s that 


there may be appreciable bias in the Y values if these are 
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drawn fron samples small enough to be practical; Table II 


indicates some results for extreme quantiles from several 


Quantile Bias for Sample of 
P@stEibution Alpha Value 100 1000 10000 
Exponential aS 05693 1 =o OU = W005 eo Gt Oma 
0.9 2.3026 - .0442 -.0045 -.0004 
0.99 4.6052 -.4175 -.0487 -.0049 
0.999 6.9077 -.4223 -.0491 
Normal 0.5 0.0 -.0125 -.0013 -.0001 
0.9 P2316 -.0320 -.0033 -.0003 
0.99 223263 -.1732 -.0206 -.0021 
0.999 3.0902 -.1361 -.0158 
Uniforn 0.5 0.5000 -.0045 -.0006 -5X10-5 
0.9 0.9000 -.0089 -.0010 -.0001 
0.99 0.9900 -.0198 -.0020 - .0002 
0.999 0.9990 -.0010 -.0001 
Cauchy es 0.0 -.0159 =. U0 = ONO 
0.9 3.0777 -.0098 - .0010 -.0001 
0.99 371.820 -.0103 -.0010 -.0001 
0.999 318.31 -.0010 -.00951 
Table Bias of the order statistic quantile estimator 


morevarious distributions. 


= ee ee — <= 


the normal, 


taking 


usual 


sanple 


uniform and Cauchy 


Cases 


median. 


analytically for the exponantial and 


and by 


Gistributions. 
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common distributions. The presence of biaS means that the 
estimator will converge to the wrong value as larger and 
larger samples are obtained. Whether this asymptotic error 
is objectionable or not depends on pragmatic consideration 
of the total sample size available but it would certainly 


seem preferable to adopt an aSymptotically unbiased scheme. 


It should be pointed out that for the 2xponential 
distribution the bias is about 10% of the true 0.99 
quantile value for a sample of 100 observations and about 
1 % when 1000 observations are considered. The normal 
distribution has Similarly poor properties so that guite 
large sections nay be reguired in these cases if bias is not 
to be a problem in the final approximate order statistic 


estimate. 


Usually bias can be removed by uSing the jackknife 
technique (see Miller [27]) but since the order statistics 
are very non-linear functions of the obServations the 
jackknifing eliminates bias only at the cost of a serious 
inflation of the variance. This inflation was found to be 
very bad for small samples by Goodman, Lewis and Robbins 
{14], where empirical evidence demonstrated that the mean 
Square error of the jackknifed estimators was 50 % Ilarger 
than for the ordinary order statistic method for samples of 
from 1000 to 10,000 observations. Moreover, implementation 
of a jackknife scheme is complicated by the requirement to 
sort not only the entire section but also a set of 


subsections. 


4. Approximate order statistics - Nesting 


If we use sections of Size n in an approximate order 


metersoeic nethod and then choose 
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meth vu = fa(n+1)}], then s has the same value as the 
a 


Bemaeentite of the Y values, where 


a = Pp Yous 
ac r{ a8 
= Pp X < 
c{ (a) Ae 
~8 Mage at. 


1=0 


This is just a generalization of the two transformation 
methods of Section I.C. For a nested scheme, then, we 


accumulate a sample of n Y observations and choose 
sl 


with = = Bate) 2 The extension of this technique to 


higher levels of nesting is straightforward. 


The price we must pay for this reduction in the storage 
requirements is an inflation of the asymptotic variance just 
as in the case of the maximum and next-to-maxinun 
transforms; note that the averaging method involves no- such 
inflation as long as the X sections are large enough for the 


asymptotic variance (1.5) to hold approximately. 1 oee 7A 
1 


were taken directly as an order statistic from an ¥ sample 


of ae observations we would have 


Tana = afi = a)seo: 
Er h nh Ss 
Y a 


With the nesting scheme, however, 


64 





Non- ERG oe ateNrs Estimation Through 
tochastic Approximation 


d2a(1c- ad} 
Manca gees 1k YX, 
1 n £2 (s Jj 
Y Y a 


where 


n Get n-u 
f (s ) = Cu] ua (1 =- a) f(s). 
Y a a 


(See David [5].) Thus, the variance will be inflated by an 


approximate factor of 


ny 2 Z aaa yo 


For example, if we estimate the 0.99 cGuantile py 
considering a Y sample generated by taking the 95th order 
statistic in X sections of 100 the variance of an estimate 
based on a Y order statistic will be 1.437 times the 


variance of an estimate taken from the X sample as a_ whole. 


Since a = 0.73576 in this case, we may continue the nesting 
ue 
process by choosing n = 100 in which case we take 
x 
2 = Yay)? the variance will then be further inflated by a 
1 


Mmietor Of 1.566 for an overall inflation of 2.242. We may 
obtain results with the same precision by considering a 
larger sample (asSuming data iS available); in the present 
case, we need a total xX sample of 14,400 to obtain a 
variance equivalent to n = 10,000 in an untransformed case. 
The total storage requirements, however, are now Just 244 
observations - 100 for the X samples and 144 for the Y 
Sample. Similarly, we may deal with a total X sample of 
2,250,000 by using a triply nested scheme with 100 xX, 100 Y 
and 225 Z observations, thus obtaining a variance equivalent 


to n = 1,000,000 in the unnested case. 
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The nested order statistic scheme results in the 
smallest asymptotic memory requirements - 115 In n for 
repeated sections of 100 - but the increase in variance by a 
factor of about 1.5 per level 1S a very serious drawback. 
There is also the problem of deternining the proper sample 
sizes and order statistics at each level - a problem which 
is most easily solved if the sample size can be specified in 
advance. The determination of the bias of the nested 
estimators and investigation of some reasonable way for 
finding confidence intervals are areas for further research, 
but the problem of variance inflation would seem to rule out 
these estimators unless a virtually unlinited amount of data 


is available. 


Asymptotic n = 10,000 n = 106 
Method Memory Size Memory Bias Memory Blas 
pal Sort n 10,000 -.0049 106 -—-5X10-5 
Censored On (pe eu, 107300 =5x 10n> 
Payne's -8/n 100 -.0049 8,000 -5xX10-5 
Average 1000 a OO Ome oH 7 [ea Os 0s 9 
Nested 115 1no 1 Z00 -.0013 300 -.0063 
244 -.0079 425 -.0064 


Table III. Comparison of various order statistic estimation 
methods for finding the 0.99 quantile. Bias values given 
@pe for the exponential distribution. Total samples of 
14,400 and 2,250,000, respectively, are needed to give 
equivalent variance results in the nested method; memory and 


bias results for these larger samples are also given. 
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oa Summary 


A summary of the order statistic quantile estimation 
methods discussed here appears in Table III; biases given 
are for the 0.99 quantile of the exponential distribution. 
Despite the conceptual simplicity and well-understood 
behavior of these estimators, we have shown then all to lack 
some desirable features. If we wish to estimate a set of 
quantiles based on a fairly large amount of data (say 
100,000 observations) order statistic estimators are clearly 


inadequate. 
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Be Robbins - Monro Estimators 


It should be mentioned at the outset that the basic 
Robbins-Monro (RM) process cannot be applied directly as a 
guantiie estimation technique in any practical mathod since 
its properties depend so heavily on the unknown parameter 8 


= a) i.e. the value of the derivative of the unknown 
distribution function at the unknown quantile. The 
properties also depend to a lesser degree on the starting 
value I but the situation is not nearly so critical there. 


Both modifications to the basic RM process considered here 
overcome this difficulty by simultaneously obtaining an 


estimate of s and ££; we thus investigate the RM process 
a 
applied to a known distribution using the optimum step size 


A = £8 mieeorcder Go obtain results which should be better 


than those for-methods which employ estimates of &8. 


1. Selecting the starting point 


Piewiirst problem to be faced “when dealing With ki 


quantile estimation is the selection of the initial guess, 
hy The results of Hodges and Lehmann [15] indicate that 


the bias of the RM estimator is closely related to that of 


+ so that starting with a value which 1s close to s Ls 
a 


‘desirable. We must have BLS < © in order to preserve 
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a 


mean Square convergence. One approach is to take ae pilot 


Sample with perhaps 1000 or 2000 observations and begin R™ 


fmetean Order statistic estimator; a second approach is to 
use a nested approximate order statistic estimator, as 


discussed in the previous section. 


This latter approach is in fact adopted here; since we 
Will for the most part be employing the maximum transform in 
this work, we begin all the stochastic approximation 


estimation procedures by choosing 


X' = max ue reo ee ee 
1 ! [ae 8 
X' = max { X oe, pide” 4 Ok } 
2 vt yt2 2V 
X' = max X xX anaes X 
3 DG) RE ee 
and then setting 
s = xX! : 
1 (2) 


This procedure requires very little computer memory and 
turns out to be very convenient for the simultaneous 
estimation problem; it is adopted in other cases not 
employing the maximum transform in order to have an 
equivalent basis FONG comparison between stochastic 


approximation methods. 


Throughout much of this work we deal with the problen 
of estimating the 0.99 quantile of the exponential 
G@tUseetbution. This case was chosen because it is one in 
which the bias of the order statistic estimator in 
reasonable samples may be objectionable (see Tables II and 


Het) . The exponential distribution is also widely applied 
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@oean empirical model for data and the 0.99 quantile is 
commonly used in statistical inference; thus, this case is 
typical of the contemplated application of our stochastic 


approximation estimators. 


aoeoad 
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Figure 1. Bias of the initial estimate = for the 0.99 


quantile of the unit exponential distribution; v for maximum 
Seansftormation is 56. True quantile value is 4.6052. 


Histogram sample size is 5000 observations on s*. 
n 
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The bias of the initial estimate a for the exponential 


Ono 9 ease 1s indicated by the histogram of Figure 1. The 
meeeogreams for stochastic approximation quantile estimators 


in this Chapter display the bias of the estimators, i.e. 


S*¥ =S -sS , 
n n a 
rather than the estimator values themselves. Pho tas 


Chapter, we use the term bias to refer to the entire 


distribution of s* rather than to E{s*] as is usual. Data 
n n 
for Figure 1, as well as for the other histograms, was 


obtained by sampling pseudo-random numbers from various 
eetributions; these were generated by the Naval 
Postgraduate School random number package LLRANDOM [21] anda 
its extensions [31]. Note that the information of Figure 1 
could have been obtained analytically, but the details would 


be messy. 


The caption for each histogram in this Chapter 
indicates two sample sizes: one (the "X sample") for the 
total number of X observations from the underlying 


population used to compute the statistic (for example, s*) 
n 
whose distribution is displayed and the other (the 


"histogram sample") for the number of replications of this 
Statistic used to compute the histogram and the sample 
Pumtary Statistics printed. WNote that the xX sample size 
Will be larger than the indicated number of stochastic 
approximation steps taken because the X sainnple includes’ the 
3 v values used for the starting point. Also, the number of 
steps taken in the stochastic approximation will be smailer 


than the corresponding sample size by a rfactor of v when the 
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maximum transform is used (or w for the next-to-maximnum 


transform). 


The letters "Q" printed below the histogram and above 
the scale indicate the location of the sample quartiles 
(including the median as the second quartile) while the 
letter "M" indicates the sample mean. The M may be printed 
instead of one of the Q's if they appear in the same column; 


thas phenomenon occurs in Figure 1. 


2. The basic RM process 


‘ 


We begin our investigation of the distribution of s* in 
n 


the RM process by considering the untransformed RM 
estimator, 1.e. one which takes a step with every sample 


value. wWe use the optimum step size A = Oe = 0.01 


for the exponential distribution. The results are shown in 


Figures 2 and 3 for a the distributions are 


s* : 
5601 
Clearly grossly non-normal, despite the asymptotic normality 


indicated in Chapter I. Note that the appearance of Figure 
3 does not suggest much of an improvement despite an 
additional 4480 X observations; the skewness and kurtosis of 


Pmenwestimator are, if anything, increasing with sample size. 


An explanation of this behavior becones clear if we 


consider the effect of the first observation, a Because 
of the negative bias in =p (see Figure 1), the probability 


that i > ae fomelightly geeater than 0.01; this means that 


about 1.5 % of the time the second quantile estimate is 
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YI 
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S, Teor oo / {0700 4) 


S pee Oa One 
1 


This is obviously much larger than the true quantile value 


of 8.60 so we expect that all of the observations on X will 


be less than S with high probability until the estimate has 
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Figure 2. Bias of the RM stochastic approximation estimator 


2 ee for the 0.99 quantile of the exponential distribution. 
Maximum transform was not used. X sample = 1268 
Observations; histogram sample = 2500 replications of s* 


ihe 
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reacied a reasonable level, perhaps 6.0. his [ane euch 
requires that the RM process take downward steps for about 
90 units. These downward steps are proportional to 1 - a 
according to (1.10) and in this case are exactly equal to 
1/fn. The value of n such that 
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Figure 3. Bias of the RM stochastic approximation estimator 
for the 0.99 quantile of the exponential distribution for an 
X sample of 5768 observations; maximum transform was not 
used. True value is 4.6052. Histogram based on 2500 


observations on s* 
5601 
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is about 2 XK 1039 so that the RM process will in this case 
have 1.5 #£@ of its distribution in the extreme right-hand 
- tail at a substantial distance from the true quantile value 


for any reasonable Sample size. 


An additional 4 % of the quantile estimates will also 
move upwards a distance of 49.5 units after having taken the 
first step down, while 5 % and 8 %, respectively, will take 
the third and fourth steps upwards. Thus, nearly one-fifth 
of the time the RM process will be over 20 units from its 
Starting point (and from the vicinity of the true value) 
after only four observations. This then accounts for the 
appearance of Figures 2 and 3; a similar situation exists 
with random Samples from a wide variety of parent 
populations, i.e. it is not particular to the exponential 


distribution. 


3. The gain sequence shift 


What is needed is a way to decrease the size of the 
first few upward steps without changing the asymptotic 
behavior of the RM process. This can be done by using the 
gain sequence 

(2) oa eee 
n (ntkp x 
instead of the 1f/n sequence of (1.11), where k 1S some 
positive constant, referred to hereafter as the shift 
constant. The proofs of Dvoretzky [7] and Sacks [33] allow 


for gain sequences of the form (2) and so we preserve the 


almost sure convergence and asymptotic normality of Ss. 
n 


For the exponential 0.99 quantile case a k value of 98 


would reduce the initial upwards step to a reasonable size 
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of 1 unit; from this point we need to move down a distance 
of only about 0.9 (on the average) to reach the true value 
of s weeluecen. Value such that 
0.99 
ne99 
1=1T00 


Thymes Ores) 


is 146 so that there will be no difficulty in reaching the 
close proximity of the true value given a reasonable sample 


eZ e. 


Since the initial estimate = is actually based on a 


Sample of 168 X observations, we adopt a shift constant k of 
eves the resulting distribution of ee 1s shown in Figure 
u. The data from the X population for this Figure are the 


Same aS in Figure 2 with which Figure 4 should be compared. 
Serearly the introduction of the shift constant has greatly 


improved the finite sample properties of the estimator. 


Under more general conditions we wish to determine a 
gain sequence shift k such that the effects of a bad initial 
step can be reversed in a reasonable number of additional 
steps. Assuming that a > 0.5, the "bad" direction is upward 
and the initial step is a / B (k+t1), using the optimum step 
divisor A= 8 Writing j for k+1 we must then find an no 


large enough that 


ob 
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iY 


Table IV shows values of n for various values of a and j. 
It is clear that using a shift constant of 100 to 200 may be 
Meetul for 0.01 < a < 0.99. 


Another interpretation of Table IV is also possible: 


the entries show the Minimum number of additional 
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Figure 4. Bias of the RM stochastic approximation estimator 


o> for the 0.99 quantile of the exponential distribution 
1 


using a shifted gain sequence with k = 167. Maximum 
transform was not used. X sample was 1288 observations; 
histogram sample size = 2500. 
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observations needed to overcome an incorrect step upwards at 
stage j of the RM process. Note that as j increases this 
number of steps approaches a limit which is approximately 
a f/f (1 - a). This means that the RM process tends to remain 


in the vicinity of the true value s_ once it has reached it 
a 


Since here it will take a steps down on the average for each 


fered Steps upward. 


5 Oiiantalesrevel,. a 
(k+1) Oe75 02200 C2390 0.999 Unies tep 
1 30 2302 2X1043 10434 3 
Z se) 225 2X10e1 10217 5 
3 68 8X1014 10145 7 
4 6 39 2X1011 10109 8 
5 5 28 2X109 Sages 10 
10 uy 16 2X105 2x0 * a 
20 | 12 2874 1x19023 36 
50 4 10 316 2X1010 87 
100 4 10 170 2X106 173 
200 mn 10 129 29310 345 
300 ui 10 118 S0o5 517 
500 4 10 110 Bio | 86 4 
1000 4 10 105 ta 1720 


Table IV. KXumber of additional observations reguired for a 

Shifted stochastic approximation method to reverse an 

initial unfavorable step. The shift constant is one less 

than the entry in the first column. The entries may also be 

interpreted as the number of observations needed to reverse 

an incorrect step upward at step j. The last column gives 
Nees 


the value of n satisfying 5 clogs ae ee 
1=j+1 
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We - thus see that estimating the Stochastic 
approximation starting point S. by an order statistic method 


from an initial sample whose size is reughly proportional to 
a f/f (1 ~ a) and then beginning the RM process with a shift 
constant k =a / (1 - a) will avoid most of the serious 
instabilities of Figures 2 and 3. An interesting feature of 
this result is that it is distribution-free in the sense 
that the optimum step size multiplier 1/ 8 does not appear 
in an explicit way. However, whether shifting the gain 


sequence will result in an effective estimation procedure 


depends on the bias of = as well as the properties of the 


random variable X whose quantile we are estimating. 


For example, if the random variable xX is widely 
dispersed it is quite possible that the RM process wili take 
two or even more steps in the wrong direction. Since the 
harmonic series on which Table IV is based grows 
logarithmically the effect of several Such incorrect steps 
may require many times the sample sizes indicated to 
overcome. The typical shape of the distribution of 
stochastic approximation quantile estimators is that of 
Figure 4; the long tail to the right is made up of 
estimation seguences which are in the process of correcting 


multi-step errors. 


If there is an appreciable bias in oF then a large 
Shift constant may seriously impede the convergence of the 


estimator to the near proximity of s . The biases indicated 
a 
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in Tables II and III in some cases are large enough to cause 


jieervculties here and the order statistic estimators used to 


obtain the initial estimate =e estimators are subject to 
considerable sampling variation. If the initial sample size 
mor Linding 2 is Ate then on asymptotic grounds from (1.7) 


the initial variance is 


which might be inflated somewhat if a nested scheme is 


used. Since oS = a f/f (1 = a), the initial standard 


deviation will be 


which is a times the size of the first downward step. Thus 


if .the initial estimate is just one standard deviation 


high we need a sample of at least n observations to overcome 
this, where 


nej 


2, NR Aas 
i=} Bi B 
or 
nae 
(3) eee >) 1. 
1=j 
The last column of Table IV gives values of n Satisfying 
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3) . 


In a given case it is thus possible that both the bias 


and the sampling variation of S will combine to produce a 


Starting point which is far from s. If this is so an 
a 


unreasonably large sample may be needed to obtain a nearly 


@arsstan GLStribution for 5S when a is close to Q or 14. The 
n 


long tail of Figure 4 is at least partially due to this 
phenomenon, especially in view of the skewed distribution 


oe Figure 1. 


Oe Maximum and next-to-maximun transforms 


The only way to overcome this problem is to transform 
the a values being used to values closer to 0.5; this of 
course can be done by means of the haxinun Or 
next-to-maximum transform methods of Chapter I. In the 
context of our present discussion, it 1s clear that these 
transform techniques work because the effect of steps in the 
wrong direction can be readily reversed. Examples of the 


maxinum transform oy! and next-to-maximum transform oe 
used for estimating the 0.99 quantile of the exponential 
distribution are Shown in Figures 5 and 6. The theoretical 
(asymptotic) variances of 3 for these Figures are .1242 and 
-1848, respectively, which compare well with the observed 


values of .1431 and .1842. The distributions in both cases 


are normal or nearly so. 


Examination of Figures 4, 5 and 6 (together with a 
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deal of data from other distributions and quantiles) 
that the 


great 


leads to the general conciusion distributional 


of the stochastic approximation estimator .S are 


n 


properties 


greatly improved by these transformation schemes. The 


che ate 
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Figure 5. 


for the 0.99 quantile of the exponential distribution 


the 


maximum 


tratvsceorn with iv 


Bias of the RM stochastic approximation estimator 


using 


> oie X sample is 1232 


observations; histogram based on 2500 replications of s*'. 
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4 


next-to-maximum method seems to result ina more nearly 


GausSian shape (as measured by the sample coefficients of 


skewness and kurtosis) for the distribution and agrees more 
both 


quite satisfactory results even in relatively 


closely with the asymptotic variance, but transform 
methods 


Small samples. 


give 
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A further advantage of the transform methods is that 
they involve less computational effort than does the 
untransformed (direct) technique. In fact the computation 
time for the untransformed case is over four times that for 
emther transform method. Thus if the X sample is being 
generated by a pseudo-random process within the computer it 
may be more efficient computationally to use one of the 
transform methods despite the variance inflation which 
requires us to generate a larger X sample for the same 
estimate precision; the time saved in the estimation 
procedure may be sufficient to offset the generation time 


for the larger sample. 


5. Direct application of the RM method 


In the previous Subsection we used a fixed step size 
A = £8, chosen so aS to give the best asymptotic variance. 
As indicated earlier, the RM process cannot be applied 
optimally (i.e., with minimum asymptotic variance) in any 
real situation simply because we do not know the actual 
value of 8. If a reasonable initial estimate of 8B can be 
found, however, it may be possible to use the RM process 


directly for quantile estimation. 


This was done in the work of Goodman, Lewis and Robbins 
{14] and also by Yuguchi [38]. They used the same starting 


value as in the present work, but with a random A value 


given by 
x! we Xx! 
i Se Se : 
B(x' - x! Kt o- xt 
( €2) cr! C3) Zea 
This A is used for all steps in the stochastic approximation 
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estimation process as opposed to the Venter method and the 
new method which use a dynamically changing A _ value. A 
second instance in which direct application of the RM 
process was attempted 1s given by Iglehart [16]; in this 


case a fixed estimate of fis ) based on the empirical 
a 


distribution function was used. 


Now the convergence of sS*' to a limiting normal 
n 


distribution with variance O(n~!) requires that we have A < 
28 (sacks [33 ]). This will not in general always be the 


case for A or for any other estimate of 8. It 1s known 


that the convergence may be much worse for A 2 28 ; for 


example, when A = 28 the variance is O(log n/n) (Major and 
Revesz {26 }). Thus, the stochastic approximation process 
With a fixed gain sequence multiplier may result in very 
poor convergence properties even if the distribution does 


not blow up as in Figures 2 and 3. 


In particular, the results of Yuguchi [38] tndicate the 


~1/4 
presence of an O(n ) component in MSE[S‘']; also, the 
n 


Sample coefficients of skewness and kurtosis of the 2M 


estimators increase with increasing sample size rather than 
decreasing as we would expect if the distributions were in 
fact approaching normality. The RM quantile estimators were 
also found to give “erratic results" by Iglehart [16] and he 


recommended that they not be used. . 


It is possible that these results could be improved if 
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a density estimator with better properties than A or the 


derivative of the empirical distribution function could he 


found. A possible candidate is just the kernel estimator of 
fas 32) ; see Rosenblatt [32] or Parzen [28]. We prefer to 
use a method which 1S guaranteed to have the minimun 
asymptotic variance, however, and so in Section III.C we 


turn to techniques which have this property. 


6. Summary 


The general conclusions of this Section are that the 


nested method for selecting oy is sufficiently robust and 


that the maximun transform is a conputationally and 


statistically effective technique for RM quantile estimation 
for well-behaved Xx populations. The next-to-maximun 
transformation and the gain sequence shift are also useful 
and may be necessary in some cases to increase the 
robustness of the RM process. Finally, the finite sample 
and asymptotic properties of methods using random values for 
the gain sequence divisor A will be much better if those 
values converge to the optimum value gg rather than 


remaining fixed. 


C. Venter'ts Estimator 


With Venter's method we enter the realm of techniques 
which can be applied to real estimation problems, i.e. those 
in which & is unknown. Seneral experience with the Venter 
estimator, however, shows that it is not very robust and 


eften tends to blow up. 
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1. Choice of parameters 
The first guestion to be addressed ina practical 


implementation of this stochastic approximation method is 


the choice of the finite difference sequence {c }, which 
n 
meom (1.22) is given by 


SG 
(4) Se ih - Uae ea SOs 0s 


é - 
In order to avoid the necessity of computing nan at each 


step of the estimation process (this requires a logarithm 
and an exponential to be calculated) we adopt instead the 


sequence defined recursively by 


(5) 


This sequence requires only elementary arithmetic operations 
and may be generated about 100 times faster than the 
sequence (4). 

The properties of f{e } may be readily found. First we 

n 
note that e > O and that e cee for all n, i1-¢€.. the 
n n+ n 

sequence is bounded below and monotone decreasing. At stage 


n suppose that 


-1/3 ~4/3 
+o cn 
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then using (5) we have 


- - ~1/3 -4/3 
= - + 
eo si? oe 
-1/3 ~4/3 
= (n + 1) Z to 7 ( Nn f ) - 


Thus taking c = ce results ina Venter process with r = 
n n 
1/3; Venter's proof [20] allows for gain sequences of this 


form. 


Selection of the modulating constant cis the next 
problem. Intuitively it seems that c should be larger when 
the X population is more widely dispersed in the vicinity of 


Ss; thus c = 1/8 would be a reasonable choice except that 
a 


8 is usually unknown. We might thus decide to estimate £ 


from the same initial sample as Se and so use a random value 


for cor else choose a reasonably robust fixed value for c. 


It turns out that the behavior of the Venter quantile 
estimator is bad regardless of the value chosen for c. The 
selection of c, however, does not seem to influence the 
estimation process as much as the bounding process (1.25) or 
i. 2c?) s Venter's convergence proof required that the 


estimate A of 8 be restricted to the interval (a*,b*) [37] 
n 


While Fabian [9] showed that we may take 


(6) 
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Et has been found empirically that in most applications only 
the lower bound a* is essential, though the upper bound 
improves the estimates Somewhat. Following the discussion 
of the previous Section we can understand the function of 
the lower bound as limiting the size of the steps which we 


allow the Venter process to take. 


we may generate the bounds (6) by using the {fe } 
n 
seguence (5) with the multiplier - for a* and the sequence 


{H } defined by 
n 


n e 
(7) H = 2 1S) 
With the multiplier cs for b*¥. It 1s well known that 


Hee en et Veo (ns) 6), 
n 


where Y= 0.57722 is Euler's constant; this approach is 
about 20 times faster than computing the logarithm directly 
but still preserves the asymptotic behavior required, for 


example, in the proof cf Theorem 1 in Chapter II. 


2. Simulation results 

Considerable Simulation effort was devoted Lo 
investigating optimum values for c, eA and oor in general, 
it was found that the Venter estimator is not very robust 


when random values are used and that it is difficult to 
select fixed values which give good results in a variety of 


applications. Figure 7 shows a typical example of the 
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Venter estimator with c = oe = 1 and & = 2 applied to the 
0.99 guantile of the exponential distribution. It was found 
that increasing the value of . decreased the spread of the 


estimator somewhat while altering the value of - seems to 


have little effect on the distribution of S!. 
n 


a on ee ee ee ee ee ee ee ee ee ee eg eee i sa 
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Figure 7. Bias of the Venter stochastic approximation 


quantile estimator for the 0.99 quantile of the exponential 
distribution based on an X sample of 5768 observations. 
Maximum transform with v = 56 used. Histogram sample = 2500 


observations of s*! , 
101 
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Increasing oo does improve the distributional 


properties of the Venter quantile estimator but only at the 
este Of introducing considerable bias into the estimation 
process. In fact, the Venter estimator seems to be 
particularly bias- prone. In pseudo-randoun Sampling 
experiments in which several quantiles from normal, uniform, 
exponential and gamma populations were estimated it was 
found that the Venter estimators had biases which were from 


50 to 1000 times as high as those of the RM estimators. 


A further drawback to this method may be seen in Figure 


8 which displays the density estimate A' obtained in the 
n 


same sampling experiment as the quantile estimates of Figure 
7 (the notation At indicates that the estimate 1s based ona 
n 


maximum transform scheme). The negative estimate values for 


B = f(s) are quite common for the Venter procedure, but 
a 


they prevent us from obtaining any reasonable estimate of 


the variance of St. We denote Var{S'!] by o2 and based on 
n n n 


the asymptotic theory we estimate this variance by 


(8) i va (i> a 


2 
n tre 
nN 


where v is the size of the maximum transform sample. 


Normally, the larger At is the less variable 1s S' but when 
n n 


A' < 0 we can say very little about o@. 
n n 
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Note that the appearance of Figure 8 1s guite Gaussian 


and -that the mean of At is,.very close to the theoretical 
n 


value for the exponential 0.99 guantile 


= 








B' = va f(s ) 
a 
Ono 22 ot 
——— SS ee ee ee ee ee a ae ae a ei eee ee ee ae ae oe 7 
{ 
147 + x + 
ce 
| + | 
| Hk KK | 
HO OK ok 
110 + HK ME OK 3 he ok AK ok + 
| Xe feck ok ak oak ak ok x | 
ROKK IO ok ok ofa teak ak ok 
| RK kK ko oe otek kak ok | 
eK ok ok Se tak oho ake 
74 + Mee a ok eo oe oe ok oe a He Ok + 
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Figure 8. Venter estimator Af° f(s ) for the 
100 O23 
exponential distribution based on the same experiment as 
Figure Ree True value is 0.3222. X sample = 5768 
observations; histogram based on 2500 observations of aoe 
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* 


The distribution of At thus agrees with the asymptotic 
n 


results of Venter [37] but the negative values are 


unacceptable for the determination of confidence intervals 


or for assessing the variability of S'. 
n 


D. The New Estimator 


1. Choice of parameters 


Romedce tne mevpestimator of Section I.E and Chapter Ii 
we must first decide on a number of parameters, just as in 
the Venter case. These decisions include the choice of a 


kernel function W(e) and a bandwidth sequence {b } as well 
n 
as the specification of the bounding method (2.4), analogous 


to the interval (a*,b*) for the Venter process. 


Considerable experience with density estimators, both 
in this thesis and in [23], indicates that the triangular 


weight function 


9 W = 
(9) ee! 


1- |x] it costes. -l 
0 otherwise 


gives results comparable to those of smoother kernels with 
some saving in computational efficiency. Other kernels 
investigated include the uniforn 

Pima i/20= X= 1/2 
eg 
u 0 otherwise 


which is somewhat unstable and subject to bias, as well as 


the smoother quadratic weight function 
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1.5 ( 1°= x2 ) ex |) oat 


W (x) = 
2 0 otherwise 
and the exponential weight function 


W (x) = 7 eet 
e Zz ; 


All of these functions clearly satisfy assumptions (W1) to 
(W4) of Chapter II and so are admissible for stochastic 
approximation quantile estimation. 
For the bandwidth sequence we again adopt the fe } 
n 
sequence used for the Venter case. Selection of b = b e 


n n 
satisfies (2.7) with g = 1/3; once again, the savings in 
computation time make the use of the fe} Seguence very 

: n 


attractive. As an alternative vwe might use the sequence 


fe'} based on the recursion 
n 


ef = 1 
1 
ate 
e' = | = ) e! 
n+ 1 a ni 
, -1/2 | 
Which may be shown to be O(n ) - Since excellent results 


were obtained with {e } this other sequence has not _ been 
n 


investigated. 


Selection of the bandwidth multiplier b must take into 
account the spread of the random variables. If too small a 


value is used it is unlikely that any xX observations will 


fall close enough to the § values to make a contribution to 
n 
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the density estimate. (Recall that 


(10) wo = iow [on n 
t 


Will be positive only if {|S -xk | <b.) On the other hand, 
ae n 


if b is too large it is possible that B will be unable to 
n 


increase fast enough in a small sample to reach very large 


Palues of ££. 


Practical experience with the method shows it to be 
quite robust with respect to the choice of b; most of the 
work reported in this Chapter and in Chapter IV was done 
with a fixed b value of 1. In data where the observations 
are more widely dispersed than those considered here, it may 


be desirable to use a random value for b. If the nested 


method is used for finding oe a convenient b value to use is 


xt . 
(3) (1) 


tt 


uSing this value guarantees that further X*' observations 


Will be within a single bandwidth b of S'°. 
n 


The lower bound on the sequential estimate A of B= in 
n 


the Venter process was absolutely essential since the Venter 

technique sometimes results in negative A values. If these 
n 

values were used, steps in the wrong direction would be 


taken and so a lower bound on the value of A must be 
n 


established. For the new process all of the increments to 


a5 





Non- Beene UE AC een oes Estimation Through 
tochastic Approximation 


4a 


the density estimator 8B are positive, so that once a 
n 


positive estimate is obtained we need not be concerned with 


this type of behavior. We may assure that the § estimator 
n 


Will be fairly stable by setting the initial value of B , 
n 


which we call De to a positive value: either some random a 


priori estimate of $B or else a fixed number. The larger 
the bandwidth sequence multiplier b is, the smaller the 


value for Ey we want to use. We thus set 25 = 1/b whether b 


is fixed or randon. 


As mentioned above, we adopt here the fixed values 


b = Be = 1, i.e. we use the estimate B given by 
n 
B 1 [ 1 ; 
= + W ] 
n n 2 j 


where or is given by (10). Note that this is-equivalent to 
a lower bound with S. = 1andL = 1; although this does not 
Satisfy the requirements of (2.4) the results in all cases 
investigated so far do not seem to call for a more stringent 
method. 

For an upper bound we again adopt the es sequence 
used in the Venter case, using a os value of 1. Although 


the upper bound makes very little difference in most cases 


it seems prudent to use it to avoid any possible instability 


in the early phases of the eStimation process. 
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2. The basic stochastic approximation algoritha 


A succinct description of the estimation process may 
now be given by setting forth its three phases as 
follows. (Note that the same basic method holds for both 
untransformed and maximum transformed estimators.) FOE 


notational Simplicity, we write "mn" for an in the algorithno. 


Ns Initialize. Obtain the initial estimate =P an daet ne 


‘ee Se S89 oe a SS SS SS ee 


bandwidth multiplier m and initialize: 


SaaS ap ieeiee4 © Ul); 
1 
hn = 13 b = m; h = 1. 
Eye Update. For each new xX observation update the 
estimates as follows: 
Pe Densityecgetwe — |S = X{- If t < b increase 


Pee rice it x 5 Suset y — a-! otherwise set y = as: 
If £f£ > hen set d = hen otherwise set d = f (this is the 
upper bound operation). Finally adjust s according to 


S=s+y / d. 


ce COonStan 


Constant 
fee eet 2 17 Nl | n=n + 1; 
bee= 9 () t = be 7 3m2 ) bd. 


s Update the constants for the next phasa: 


as Results. The final estimate of the a-quantile is s. 


wo ey aw ow = ow 


An estimate of Var[S]} 1s given by 


Var{s]j] = (n-1) a (1-a) / ff, 


on 
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while f/(n-1) is an estimate of f(s ). 
a 


The thus requires us to store just five 


Variable values (s, f, n, band h) and a Of 
m). After the kth X value has been used in 


process 
pair fixed 


values (a and 


e £1is k B, nis k+l, bise 
K+ 1 k k+1 


step 2, s has the value § 


bmn 
' 
f 
| 
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and h is H ‘ 
k+1 


To carry out the maximum (minimum) transform with 

e ° V V 
sections of size v, we use the value a' = a (af = (1-a) ) 
mo steps 2.b and 3 and carry out step 2 only for each of the 


section maxima (minima). The estimate of f(s ) in step 3 is 
a 


1 1 
then a Cisljor > {Or £/{v(1-a) (n-1) ] for the mininun 


case}. Here we will require one more constant (v) to be 
stored as well as two more variables which keep track of the 
humber of observations considered so far in the current 
section and the value of the maximum (minimum) value 


encountered. 


3. Simulation results 


An example of the new stochastic approximation quantile 
estimator applied to the 0.99 quantile of the exponential 
distribution appears in Figure 9. fhe asymptotic variance 


for this maximum transformed case is 


(11) Vaoei st] Peed Gd 
n - 7 


Vv 
nf{va f(s )} 
a 


i) 
N 
e 
Od 
OV 
aed 
Nn 


or 0.02362 for n = 100. This corresponds guite closely to 
the observed value of 0.02435 and the shape of the histogran 
also appears reasonably Gaussian. We thus conclude that the 
asymptotic theory is a generally acceptable description of 
the behavior of the new stochastic approximation quantile 


estimation scheme for moderately large samples. 


es, 
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Comparing this distribution to that of the 
corresponding Venter estinate (Figure 7) we see that the new 
method results in an estimator whose properties are much 
more reasonable; the observed mean bias is less for the new 
estimator while the variance is smaller by a factor of 7. 
The distribution also appearS much more GausSian and the 


sample coefficients of skewness and kurtosis are smaller in 
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We conclude that the new procedure is decidedly 


the Venter technique for quantile estimation. 








Figure 10 shows the distribution the density 
estimate B! (or £/(n-1) from the algorithn) which was 
n 
obtained at the same time as the data of Figure 9. Once 
again the distribution appears approximately Gaussian and 
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paguce 11. Bias of the stochastic approximation guantile 
estimator for the 0.99 quantile of the exponential 
distribution based on an X sample of 5768 observations. 


Maximum transformation was not used in this case. 


Sample size = 2500 observation 
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mre Observed mean of 0.3264 is quite close to the 
theoretical value of 0.3222. On asymptotic grounds from 
Theorem 4 the variance should be 
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Figure 12. Density estimate Dea G for the 0.99 quantile of 
the exponential distribution; based on an X sample of 5600 


observations without maximum transform. Actual value is 


wey eee ee oe 


0.01. Histogram sample is 2500 observations of ena 
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which is very close to‘the observed value of 7.812 X 10-3. 


Also there are no negative values of Bt so that all of then 
n 


are admissible as variance estimators. 


The new estimator was also applied to the 0.99 quantile 
of the exponential distribution without using the nmaxinun 
transform; the results appear in Figure it. Clearly the new 


process is far more stable than either the RM or Venter 


methods; the distribution of is very nearly normal 


S 
5600 


with an observed variance (0.02328) close to the asymptotic 
value (0.01768). The density estimate SRA for this case 


is shown in Figure 12; the mean is close to the true value 


of 0.01 while the observed variance of 2.07X 10-5 is also 
Close to the asymptotic value of 1.59 X 1075 although the 
distribution is skewed to the right and does not appear 


Gaussian. 


Despite the results of Figures 11 and 12 we stiil 
prefer to use the maximum transformed version of the new 
process both because it 1s computationally faster and 
because its finite sample properties are generally Superior, 
especially for quantiles more extreme than the 0.99. It is 
nevertheless encouraging to find the new process 
sufficiently stable to avoid the very heavy tails displayed 


by the untransformed RM estimator (see Figures 2 and 3). 
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4. The stability of the new estimator 


An explanation of the stability displayed in Figure 11 
follows if we consider the role of the variable f in the 


algorithmic description of the new method given above. 


Meedlt that £ =hnB, i.e. it is the divisor in the basic 
n 


stochastic approximation recurrence relation. Now £ will 
increase at each step when we use the triangular kernel 


function only as long as the latest X observation is close 
Nh : 


ro. S . If f£ does not increase, however, the size of the 
n 


steps taken by the process will remain the same; we thus 


have an analog to the accelerated process of Kesten [18] 
where the step size remains constant until we have straddled 


the true value by taking steps in both directions. 


The new method is an improvement on Kesten's technique 


because the step size adjustment here is made for each X 


observation. instead of determining that the estimator s§ 


Bye a 


1S in the vicinity of s by looking at the changes in step 
a 


direction we examine directly the relationship between X 
n 


and s§. For example, if = is a long ways from s_~ so that 
n a 


none of the X observations are near S,. for small Jj values 
J 


then the process will take steps of size 1/8 = 1 until it 


U 
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reaches a point where S is close to the latest observation 
. * n 


value Xe Once the 5 values are close to the xX 
n n n 


observations the w terms added to f£ will be positive and SO 
n 


the step size will decrease. 


5. Confidence intervals 


The final area to be investigated here is that of 
applying the new estimation procedure to the determination 


of confidence intervals on s. To obtain a 100 p &% 
a 


confidence interval ons we use 


a 
(12) 5 es | Ieee 
nt 4 / 3h L n p 
where u is the upper 1 - p/2 point of a standard normal 
Pp 
random variable. It would be possible to establish the 


asymptotic properties of confidence intervals estimated in 
this way following the work of Sielken [34]; this has not 


been done here. 


To investigate the finite sample properties of the 
confidence intervals in the exponential 0.99 case, however, 
further Simulation experiments were undertaken. Based on 
10,000 replications the coverage of the confidence interval 


(12) for various p values was as follows: 
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Pp Actual Coveraqde 
0.90 0.8777 + 0.0033 
0.95 0.9265 + 0.0026 
0.99 N97 55 te 020015 


The data of Figure 13 show the distribution of the upper 
95 % confidence limit (with the mean of 4.605165 subtracted) 
femeea Sample of 5/68 X observations. On asymptotic grounds, 
the expected value for this limit should be 0.25271 which 


corresponds very well to the observed mean of 0.25623. 
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Figure 13. Value of the upper 95 % confidence limit for the 
0.99 quantile of the unit exponential distribution; the 
true value of 4.605165 has been subtracted from each 
observation. Estimated by stochastic approximation from Xx 


Samples of 5768. Histogram sample size = 2500. 
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6. Summary 


The new estimator has been used to estimate all the 
quantiles in Table I for random variables from the uniforn, 
normal, exponential, gamma and Cauchy distributions. So 
much data was obtained that it would be impractical to 
attempt to display it all here; the results were, with few 
exceptions, in general agreement with those shown here for 
the exponential 0.99 case. Serious irregularities were 


noted in the Cauchy case; these were due to the infinite 


variance of the initial estimate Sot When the Cauchy 
experiment was repeated with the fixed starting value = = 


0, however, reasonable agreement with the asymptotic thecery 


was obtained. 


The other major limitation fotnd was in using the 
maximum transform for the estimation of extreme gquvuantiles 
from distributions whose densities do not approach zero in 
one or both tails; examples include the uniform distribution 
and the left-hand tail of the exponential distribution. In 
these cases the transformed density B' is very large - 
54.90 for the 0.01 quantile of the exponential distribution, 
for example - and it requires very large X samples for the 


value of B' to increase sufficiently to obtain good 
n 


eoeimates of (soe WKS mest eGEing & values have 
n 


distributions which agree with the asymptotic theory, but 


the too-small density estimates result in confidence 


intervals which are much too wide. In other words, in this 
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“ 


Situation the point estimator of s is satisfactory but the 
a 


density estimate (and hence the confidence interval) is 


relatively poor. 


We conclude this Chapter with the observation (based on 
the above digital simulation experience) that the new method 
overcomes most Ot the limitations of stochastic 
approximation techniques for quantile estimation. The 
asymptotic theory appears to be an adequate description of 
the behavior of the estimators in samples large enough to 
give reasonable variances and we are confident enough of the 
distribution of a that we may use the estimate of aS Bor 


the construction of confidence intervals. 
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Chapter IV. BIAS AND MEAN SQUARED ERROR 


In the previous Chapter we examined the problem of 
finite sample performance of stochastic approximation 


quantile estimates by investigating the distribution of the 


er pee ee ee ee eS ee 


difference s* = S§ - 5S , Which we refer to hereafter as the 
n n a 


bias of the estimator. Considering the distribution of s* 
n 


was done because simply looking at its expected value is not 


sufficient if one is to explain the extremely POOL 
performance on some stochastic approximation quantile 
estimators. As illustrated by Figures 2 and 3, this poor 
performance 1s characterized by very heavy tails and 


exceptionally wide dispersion of s*. By using the maximum 
n 

transform, however, and the new technique of Section III.D, 

we were able to overcome these drawbacks and obtain 


estimates 5 whose distribution is approximately Gaussian. 
n 


Bias is usually taken to be E[S*] and once the problem 
n 
of extremely large deviations has been overcome it is 


necessary to look at bias in this average sense. This is 


because one facet of the poor performance of stochastic 


approximation quantile estimators is that convergence of 
E[S ]} to the true value s is very slow as measured 
n 


a 


empirically even though the estimates are asymptotically 
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unbiased. In fact, sYuguchi [38] found empirical evidence 


~1/4 
that the rate of convergence of the bias is O( a G ) for 


the stochastic approximation estimator proposed by Goodman, 


Lewis and Robbins [14]. This compares with O( n ) for -the 


emaer Statistic case. 


We examine this question here for the new estimator 


through Simulation because no analytical results are 

available or easily obtained. Our goal is to determine 
“1/72 ee 

whether the bias converges as nan as indicated by the 


theory or whether the rate of convergence 1S slower, aS 
indicated by Yuguchi [38]. By developing a model for the 
convergence of the bias, we will be able to compare 
stochastic approximation eStimators with order statistic 
estimators; we may also be able to use techniques Such as 
the jackknife [27] to reduce the bias in situations where it 


is significant. 
A. Description of the Model 

In a general statistical problen, pag 7 is an 
estimator of the fixed but unknown parameter 8 based on a 


Sample of size n then we have for the mean squared error of 


T 
n 


(1) eee (rh eS <3 
n n 
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= (ETT - Oj}? + Var{T ]}, 
n n 
where the first term is due to estimator bias and the second 
to sampling variation. Now it may be that T converges 
n 
weakly (1i.e., in distribution) to a random variable T (which 
is often normal) and also that MSE[T } ~-> M. (In either 
n 


event we may have T suitably normalized, e.g. nT ---> fT.) 
n n 


We may thus choose either MSE{[T] or M as a measure of the 
expected error of the estimator. Hodges and Lehmann [15] 
point out that MSE[T]) <¢. M and that strict inequality is 


possible. 


For the stochastic approximation quantile estimation 


problem, the result of Lemma 9 in Chapter II implies that 
(2) n MSE(S ] --> M2 0 
n 


while the asymptotic normality result of Theorem 3 Shows 


(3) n MSE[S] = aaa ’ 


where S is weakly convergent to S. Now Similar results 
n 


exist for the basic RM process [7] aS well as for the Venter 


method [36]; the asymptotic variance (3) is the same in all 
three cases as long as we select A = $8 for RM. Thus to 
assess the practical utility of any given stochastic 
approximation method which has a suitably Gaussian 
distribution we attempt to measure the value of M which 
results when we sample from a population with known 
properties, e.g. independent and identically distributed 


exponential variates. 
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Hodges and Lehmann £15] have found for a linear model 
of the RM process that the mean Square error components of 
(1) result from a bias term related to the squared error of 
the initial estimate and a variance term related to the 
asymptotic variance. The quantile estimation problem . does 
not satisfy the hypotheses of the Hodges and Lehmann model 
but those authors state that sone Monte Carlo 
experimentation has indicated that their results are fairly 
mopust. We thus begin our analysis of stochastic 
approximation quantile estimation with the assumption that 
the differences between methods will be due to differing 


SBstimator bias. 


In view of (2), we have that the bias of § is 
n 


=1/2 
O( nn ) and we adopt the model 


-1/2 - - 4 
(4) hes je—= oa for an + roo + o( n i; 
n 0 1 2 


in accordance with the Hodges and Lehmann results. (Recall 


eaat s*¥ = § oS). ) We recognize that (4) must be 
n n a 


empirically validated before it can be applied in a specific 
case. Despite the Hodges and Lehmann result, it is possible 
-3/4 -1/2 


that terms of other orders (such as n or n flog n) 


may be present. Nevertheless, this model provides a 
convenient means of assessing the relative bias of different 


Stochastic approximation estimators. 


One possible objection to (4} can be raised based on 


the results of Yuguchi [38] who found that there was a 


a2 
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a -1/4 
Srgnificant n term Ly the bias of stochastic 


approximation quantile estimators. Following Goodman, Lewis 


and Robbins [14], Yuguchi used the basic RM process with a 
fixed random divisor A. The problem with this approach is 
that due to sampling variation A will sometimes be larger 
than 28 and then, according to the results of Sacks [33] 


and Major and Revesz [26], the convergence of S to S_ may 
n a 


-1/2 
be much slower than the n implied by (1.13). Lemna 9 


guarantees that this situation will not exist with the new 
estimator; however, it is prudent to see whether the 


Simulation results show bias terms of a lower order than 


ale 7. ; 
n and also to compare the model (4%) with alternative 
schemes. 

The estimation of a - and Es from specific 


realizations of {S } is a difficult problem because of the 
n 
high degree of autocorrelation within any stochastic 


approximation process, i.e. between § and s§ a The 
n n+ 


general design problem of assessing the model (4) with 


dependence has not been addressed. To overcome this strong 
dependence we generate n independent realizations of the 


process: 
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Vs: s* 
1 

2: s¥, s* 
1 2 

ns StS) eee, S* ., S* 
1 2 n-1 n 


and select as our sample the final estimate value in each 


realization. The result is a sample Sots a eae =o) of 
independent random variables; note that a total of nn 
different starting values and a{noth observations of X fron 
the parent pcpulation are required to obtain each 


independent sample. If we are uSing the maximum transform 


(as we will be throughout this Chapter) each new s*¥*' value 
n 

will be based on v observations of X so that the total X 

Sample wili consist of cote values. We repeat this 


scheme to obtain m independent {s*'} samples; oe will 
= : 


denote the bias of a in the ith independent sample. 


The evaluation of a specific stochastic approximation 
method with respect to bias will then consist of estimating 


the value of a subject to some sort of validation effort. 


(Note that (1) and (2) imply that a = Q.) We then obtain 
the required estimates Brae - and by generalized least 
squares from the linear nodel 


-1/2 - 1 
(5) s* =r +fein + roen + vs: Ce a es er 
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where the v ‘s are mutually independent random variables 
n 


with 


E[v j = 0 
n 


(6) 


Virivee= 0 -/7N . 
n 


In this formulation, o2 is unknown and is also to be 
estimated; one criterion of the adequacy of the model (4) 


will then be how closely we approach the asymptotic value 


2 = 1- co) 
0 ee 


B. A Variance Reduction Scheme 


When uSing the new estimator with the maximun transforn 


to estimate s for the unit exponential distribution, one 


finds that the bias is about -0.007 for K samples of size 
7000 (i1.e., about 125 maximum transformed steps with v = 
56). The asymptotic standard deviation in this case is 
eis? from (3.11). Thus to determine the bias for each 
maximum transformed step to within a sampling variation 
equal to one-tenth of the absolute value of the bias 


requires a total cf 
m = Qi Ae Srey 
C oro007 J ‘ 
replications of the independent {s*'} sequences of the 
n 


previous Section. 


The amount of work required by this naive approach 


leads us to investigate methods of reducing the sampling 
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vaciation of s* without changing its expected value. The 
n 
classical Simulation techniques of variance reduction 


represent an obvious means of doing this; for more details 
on these methods see Gaver and Thompson {13]. The approach 
we adopt here is to define a control variate P which is a 
m 
Statistic computed from the same X sample used to find s* 


n 


and which is highly correlated with s*. The technique can 
n 


be applied with or without the maximum transforn. 


In general we choose aS our control variate P a 
n 
Statistic whose distribution (or at least whose moments) we 


can find. As our estimate of the bias E[s*] we then use 
n 


(7) St JS Po EL P i, 
n n n n 


where E{P j] is known. Clearly 
n 


Ef st] = EL s*] 
n n 


VAG rote mVvabt ot] t Vari P } + 2 Covar[ s*,P. }, 
n n n oe 8 


so that if P is negatively correlated with s* there may be 
n n 


a decrease in the variance. One way to insure that there 


Will be such a decrease is to use instead the value 


(8) Se oteuit2 EP. J} 
n n nh oon n 


where the constant TT is chosen to minimize Var{[st]. Note 
n n 


that the estimate (8} is also a variance :cduced estimate 
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using TT P aS a control variate so we are justified in 
non 


using the same symbol st as for the estimate (7). 
n 


As Gaver and Thompson [13] show, the optimum value of 
TT is given by 
n 
Tl = - Covar{s*,P ] / Var[{P }; 
n on n 
the resulting variance of st is then given by 


Nn 


(9) Vom mova SA COVarl S*yP |] 7 Varl| P } 
n n eee n 


Wy 


Var[s*} (1 -p*), 
n n 


where Oto THe TCOELelat on between S*¥ and P . Thus if P 
n n n n 


is highly correlated with s* we may expect substantial 
n 


improvement in the variance of our final result. 
Of course we will not in general know Covar[s*,P } 
n on 
(although Var[P ] will sometimes be known) and so we are 
n 
unable to choose the optimum value for TT ; we may estimate 


n 


the optimum, however, by using 


AN x =- Hf s* P - EI! P 
(10) TT = ie Pe es 
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% 


where p{s*] is the mean of the m realizations of s*. When 
n n 
A 
we use the ff value given by (10) in (8), however, the 
n 


resulting s+ is no longer an unbiased estimate of E{s*], 
n n 


although as Gaver and Thompson [13] point out we expect the 


bias to decrease with increasing n. 


It has been found that the values of TT do not change 
n 
very much with n, at least not when n is moderately Ilarge. 


Since by the design of the simulation experiment s* | and 
Jsi 
=. _ are based on disjoint X samples for j # k, the value 
ae 
b 
A 


A 
(11) ia. = 2G) 1A. + TT ) sf 2 
n n- 


1 n+1 
will be independent of s* and P and therefore it will not 
n n 
cause st to be biased. Furthermore, for large m values it 
n 


should be close enough to —T— to allow for a close approach 
n 


to the variance reductian (9). 


The foregoing analysis applies no matter which control 


variate P we choose, The art in control variate variance 
n 
reduction lies in choosing a suitable P ; a good choice will 
n 


be easy to compute seguentially from the X sample, will have 
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* 


known moments and will be highly correlated with s*. One 
n 

such choice for the stochastic approximation quantile 

estimation problem is tq use an estinate of the 


Ss ~percentile, i.e. we take 
a 


(12) P = {Number of X values €<§ Ss} fn. 
n a 


Since we are performing a synthetic sampling experiment, s 
a 


is known and from the definition of s we conclude that n P 
a n 


has a binomial distribution with parameters a and hn. 


Furthermore, 
E(P jJ=a 
n 


VensL Ie a =a {l- a) f/f ne 


Now if the observed value of P is greater than a we 
n 


expect the X values in the sample to be larger than usual 


and consequently the value of S to be larger than s . This 
n a 


conjectured positive relationship between P and & (Or, 
n n 


equivalently, s*) is borne out in sampling experiments; what 
n 


1s surprising is the very high correlation coefficient 
observed hetween these two random variables in many 
applications. For example, in the case of the 0.99 quantile 
of the exponential distribution we observe correlations as 
high as 0.90 for moderate values of n; this results in 
Variance reductions of about 80 % based on (9). This in 


turn leads to confidence intervals on FE{[{s*] which are just 
n 
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40 4% aS wide as those obtained using the uncontrolled s* 
n 


values. 


A plot of a joint simulation sample of p and s** for 
n n 
the exponential 0.99 quantile is shown in Figure 14. The Xx 


sample in this case was 5768 observations which corresponds 
to 100 maximum transform steps ({(v = 56). A total of 2500 
replications were generated to produce this plot. The 
computer program used to produce Figure 14 is typical of the 
software tools developed in the course of this research; 
other examples include the histogram Figures of Chapter III 


and the histogram plots of Section IV.D. 


C. Regression Analysis 


For the purposes of analysis we adopt the general bias 


model 
k 
(13) Eee eee Sea 
n iad eet 
where g (n) = 1 for all n and g (n), j21, is some function 
0 : 
of n; for example 
ae 2 
780) = i Be eee 


corresponds to the model (4). 


fo estimate the r 's in (13), we obtain a set of rn 
5 n 


independent realizations of s* for n = L, Ltt, ..., N and 
n 


then use generalized least squares with the relation 
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Non-parametric ee ee Estimation Through 
Stochastic Approximation 


kK 
(14) Sebo = 2 tt g.{n) + v 3 ele Cy cere 
| | nyt JO Jj Jj N51 ; 
Ma ligZ gvo60ecg Mare 
n 
As before, we assume that the v .'s are independent random 
ne 


variables with zero mean and variance proportional to 1/n; 

we choose L large enough that we may invoke the asymptotic 

distribution of s* to claim a normal distribution for v. 
n n 


This will allow us to apply the ususal F and t tests in the 


regression. 


To apply generalized least squares to (14) we multiply 


the relation by JM; the random errors in the transformed 


equation are now independent with zero mean and common 


variance a2, We express this transformed relationship in 


the compact form 
(15) S=Grtv, 


Where boldface lower case letters represent vectors and 


upper case ones, matrices. We define 








cS % c s* % 
L n;1 | 
Ss S* 
s = Ince hi] se Seen 2 fo 
ee @ ®@ nh isla | 
af { | s* 
tL n3m 
eae 
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Non-parametric Oyantile Estimation Through 
Stochastic Approximation 





a 1 . 1 3, (n) ... 3, (n) 
G 1 auete 
G = L+1} + G =Yi VEL Se 
| Stee | n = @ 6 
oe ! | 1 ash) aete eM 


Bes = li gdeveterg el; 


Note that G has m identical rows. 
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The least squares estimate of r is then 


T -1 7 
(16) E-GchG jG rs | 


while an estimate of o2 from the residual sum of squares is 


given by the well-known relationship 
(17) c2 = ss 


where M is defined as the total number of s* observations, 
A | 


eC, 
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Non~parametric OS Te Estimation Through 
Stochastic Approximation 


T 
(18) ae 


tt 

Tw 
Q 
Q 


i 
eo 
\Q 
Lit 

foe 
- 
| a 
ll 
© 
= 
eae) 
= 
® 
e 
® 
= 
~ 


nm g,(n) g.(n) ; 
joy at, j 


di 
note that G G depends only on the model selected and not at 


all on the observed s* values. We also have that 
n 


T 
Gs 
1 een 


My 


(19) Gs = 
n 


Ly Ji BUR NA bilo ne 


In this case the general term is 


N m 
= eae n sx 
ee 5 ae, Wha 7 
N _ ¢ 
eer a St), 
i= 0 eg n 
where pf{s*] 1s the mean of the a observations of s*. 
n n n 


Finally, 
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Non-parametric eke" Apps Estimation Through 
Stochastic Approximation 


st N E 
(20) SS = > SS 
n=L oan 
~ n - Ss 
n=L i=1 nvasi 
> (s*] 
- Ss 
n=L B, 
where Seen) 1s the sample second moment of the s* 
n n 
observations. 
As indicated above we expect fn v. to be normally 
nee 
distributed, or approximately so. Thus it Wiel: be 


reasonable to use F-tests to test the significance of the 
regression and also to compute multiple correlation 
coefficients as long as the transformed equation (15) 


contains a constant term. This will be the case only if one 


of the functions g (n}) is equal to 1 / Yn for some j. We 
: A 


Will then also require the value 


'N 
(21) D= # /nm pis*] 
n=L D n 
for use in the analysis of variance table in the regression. 


We may thus accumulate data for the regression by 
recording nm , pl[s*] and pap S* for the n values of interest. 
n n n 


The necessary regression values are computed by means of 
(18)-(21) and may then be used to estimate er and ge 


according to (16) and (17). This means that we may deai 
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Non-parametric Quantile Estimation Through 
Stochastic Approximation : 


With arbitrarily large values of m with a relatively modest 
n 


(and fixed) amount of memory. Furthermore, we may estimate 
the parameters for several models with the same simulation 


output values. 


When we substitute the control variate estimate st for 
n 


s* in this analysis we obtain random errors vt which still 
n n 


have zero mean but whose variance properties are unknown. 


From (9) we have 


Var{st] = Var{s*] (1 ~ p#?) 
n n n 
= a(i-a) (1 - p2) 
fa n 
so that Var[ st] decreases at least as guickly as i/n. Wwe 
n 


adopt the hypothesis, then, WING Var{stj] = O(n7!t), 
n 
recognizing that we will have to validate the conjecture 


based on the Simulation output. The constant of 
proportionality in this case will be less than o2 £4x2because 


of the variance reduction obtained through the use of st. 
n 


Since the control variate P will have a distribution close 
n 
to normality for moderately large values of n, we expect the 
distribution of vt to be once again approximately Gaussian. 
n 


D. Simulation and Regression Results 


A summary of the cutput from a Simulation in which 
on NG for the exponential distribution was estimated appears 


om fable V. The estimation used the algorithn of Section 
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Non-parametric Quantile Estimation Through 
SCOCHIaAStae Approximation 


TiII.D.2 and the maximum transform with v = 56 (see Figure 9 


in Chapter II for an example of the distribution of = in 


this case). Vaiues of n (i.@., number of steps) ranging 

from 7 to 1590 were investigated with m = 40,000 
n 

replications per step. A regression uSing all this data 


will thus have 6,000,000 degrees of freedom. 


The first question we address here is whether the 
observed variances are adequately described by Our 


assumption of g?/n or not. The variance of s¥** is 
n 
asymptotically 2.361 / n (see (2.11) ) but the order of the 
variance of st* is in general unknown. A Simple linear 
n 


regression on the data of Table V shows, however, that 
Vohietoye> =~V6.00003) + 12 

n 
Varest i= -0.00322 + J 

n 


The regressions are both significant (respective F-ratios 
are 141,550 and 12,311) and neither suffers from lack of 


fit; we thus conclude that our assumption of a 1 / n factor 


in the variances of s* and st is justified. The relative 
n n 


Sizes of the coefficients of the 1/n terms indicate that the 


control variate scheme results ina 35 % variance reduction 
miuethis Case; in fact, 0.87762/2.53712 = 0.346. 
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None parametrr+c Ouantile Estimation Through 
Stochastic Approximation 


X Sample 
n Size pl s** j G2f sk! ] fi{st' ] G2{st* ] 
n n n n 
1 168 =O. 11274 0.65112 =O li2e4 G2 O52 
2 224 -0.04441 aS ese -0.04338 0.456 98 
3 280 0 ihe 0.45439 =O. 01847 Ol32622 
uy B36 =O Oa 9 O70 023 -~0.01015 Osi SMe) 
5 SoZ -0.00954 0.34222 =e 00719 0.20001 
6 448 —(),.00 673 0.30674 = 0. 007 OG O17 032 
7 504 =0 069 1 ee da ai Zon W0G2s 0.14684 
8 560 0.00241 0.25076 S017 Oey, 0.12566 
9 6 16 G2 00133 523395 -0.00124 0.11364 
10 672 =I 24 5 0.21860 = 05:00 450 Or Oa 
11 728 -0.00734 0.20184 ~0.0074K2 0.09279 
12 784 -~0.00821 0.18674 = UATOAE STUN 0.08458 
13 840 =O .010 7 1:0 Vo Ves Z 0 00 B40 0.07819 
14 896 2) UE Oh, 0.16646 =o 00G23 O07 O78 
15 9572 =O OOOO z2 0.15746 = a0 Gas 0.06600 
16 1008 Oe O93 0.14916 -0.00604% 0.06108 
17 1064 -0.00872 0.14253 -0.00854 ORE Chey sha) 17 
18 1120 — EO ig 3 0. 13442 S16 SS sis 0.05400 
19 1176 -0.00391 Oseb22 5 -0.00760 0.05001 
20 ize 2 =O O07 oO On 12209 sO 00877 0.04712 
21 1288 =0.00901 0.11687 -~0.00887 0.04490 
IAF 1344 =) .005965 Oe (itz 10 ~0.00872 0.04184 
23 1400 =O ioe OeO7 11 -0.00964 0.04074 
24 1456 ~0.00847 0. 102686 ~0.00921 J20S 823 
ZS 1512 a0! OKO Sys, 0.10043 -0.00744 0.03684 
Table Vig Estimated bias and variance of the improved 


stochastic approximation estimator for the 0.99 quantile of 


the exponential distribution. Algorithm of Section III.D.2 


and maximum transform (v = 56) were used. 
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MOUS eV eae Quantile £E 


tochastic Approximation 


stimation Through 


X Sample 

n —s Size Bls*'}  a2[s*"] PALst*} GE st") 

26 1568 Si) 5 (OMNES) 2 0.09470 -0. 00936 0.03448 

27 1624 -0.00865 0.09208 -0.00894 Oan03 272 

28 1680 -0.00670 0.08803 -0.00878 0.03116 

29 1) SNS ~0.00897 0.08668 -0. 00880 0.03046 

30 1792 =e OO0 7 0.08380 >On OOo a2 0.02841 

31 1848 -0.00914 0.08040 -0.01007 Oa02738 

32 1904 -0.00938 0.07866 -0.00931 0.02660 

33 1960 =renOet 0 0.07589 -0.00981 0.02526 

34 2016 -0.00945 0.07496 -0.00881 0.024 88 

a5 2072 OOo 2 0.07106 molt OU SAE: Vea02275 

36 2128 10) (6) (HONS) 0.06944 -0.01005 0.02256 

37 2184 —Oevu0918 0.06846 -0.00906 0.02181 

38 2240 05 ONO CI 0.06662 -0.00806 0.02086 

39 2296 HO) OOS 7s: 0.06461 -~0.01057 0.01993 

40 2352 =10) 5 WQS) oy 0.06278 -0.00918 0.01965 

uj 2408 000927 0.06100 =05 0067 2 0.01875 

42 2464 -0.00945 0.06044 -0.00918 0. 01823 

43 2520 Oe OMG ST 1/83) 0.05885 ~0.00964 0.01793 

uy 2576 Oem 4 00578 2 et) OUI OL 01717 

45 2632 -0.01146 0.05545 ~0.00977 0.01640 

46 2688 -0.00819 0.05434 -0.00822 0.01603 

Y7 2744 -0.00989 0.05362 ~0.00901 are U4! yeh: 

4g 2800 -0.00689 0.05316 = 0 SOOO 0.01545 

49g 2856 O72? 5 0.05165 -0.01001 0.01491 

50 2912 -0.00889 0505052 ~0.00960 0.01434 
Table V. (Continued) Estimated bias and variance of the 
improved stochastic approximation estimator for the 0.99 
quantile of the exponential distribution. Algorithm of 


Section III.D.2 and maximum transform (v = 56) were used. 
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Non-parametric Quantile Estimation Through 
stochastic Approximation 


X Sample 
n Size fC s*! j G2{s*'] p{st! } G2( st ] 
n n n n 
51 2968 1) 103) he 0.04972 -0.00994 0.01394 
a2 3024 ~ 0.01072 0.04852 -0. 00934 0291369 
53 3080 O07 30 0.04734 — avn oc 0.01326 
54 3136 ~0.00847 0.04653 -Q0. 00956 Oe O27 
55 SaigZ -0.00839 O04 57 1 -0. 00910 0.01238 
56 3248 = 0. O9007 8 Os ONS 8: OR OOo i On O32 
57 3304 10) Orie 0.04454 -0.00924 O20 20s 
58 3360 =O 200955 0.04361 =O iz Ue Ol od 
a9 3416 ORG 70 0.04257 =O. 00904 0.011480 
60 3472 -0.00843 0.04242 -0.00854 Op Odirsis 
61 Bo26 -0.00739 0.04161 —OmeOe7> 0.01108 
62 3584 ~0.00581 0.04053 =O ote OerO0ls 7 
63 3640 =O 0.04038 =0, 00854 0.01049 
64 3696 =-0. 00919 O63 0 14 -0. 00831 0. C1007 
65 512) SW =F, 00 9019 0.03870 =O. OO Bob OOO S72 
66 3808 -0.00841 01903795 1) OOS) 8 0. 00968 
67 3864 -0.00825 0703:750 -0.00882 OFOCg 53 
68 8920 10) 100) 7/3: OR O3icuS = Ong 3 Os OOS) Ake 
69 3976 [OmOOe7 9 0.03666 = OO a0 OFC 0937 
70 4032 -0.00731 0.03593 -0. 00906 0.00893 
71 4088 =O 0 0G 0045.08 -0.00916 0.00879 
2 414g = 0. O10 G14 0.03477 10) ONO a 0. 0Ce7 2 
v3 4 200 Op U0on © 0.03449 -0.00845 0.00841 
74 4256 =9. 00906 0.03344 -0.00848 0.00823 
75 4312 -9.00932 0.03330 -0.00941 0.00807 
Table ¥. (Continued) Estimated bias and variance of the 
improved stochastic approximation estimator for the 0.99 
quantile of the exponential distribution. Algorithm of 
Section III.D.2 and maximum transform (v = 56) were used. 
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Non-parametric 


Stochas 


t 


Yantile Estimation Through 
1c Approximation 


X Sample 
n Size F[s**]  G2{s*"] f{st'}] G2[{st"] 
n n n n 
76 4368 10) SO) iN: Hos sus =O ac Pale her 
We] G4U2Q4 a0) ORONO sg: O7032.99 =O20'0 37-2 0.00795 
78 4480 =O GU Cbs 0.03210 One ge 6 Oa00va 7 
79 4536 =O. 0.0320 0.03166 =O 20) 7 OG 0.00751 
80 4592 Ur eeO Oo oz OE 03082 0 OMe as 0200732 
81 4648 reo 097 | ORs 091 -0.00816 O-C07 S14 
82 4704 SU OOgus 05 085 SOO 7 7 OZ 007 17 
83 4760 0500S 2 Qe 0s 0G7 -0. 00744 G2007 25 
84 4816 Oe 009 FI Os WASTE, -0.00807 On 0G 
85 4872 OOO 7 del 0.02934 sOe0G7 71 O20 0672 
86 4928 Oey Go OE 02907 =0200792 0.00669 
87 4984 SOO 6 IS OmUZ899 =O.00G23 0.00662 
88 5040 SOOO 7 od 0.02860 Oe 0 0336 0.00643 
89 SoG HOE OOS Es 0.02844 a 0 Ores 0.00648 
90 SoZ 1) 5 (0) Was Onze ta SU) 1G PE SHE 0.00634 
91 5208 =O00 767 0.02742 =O. 00796 O00 Zz 
eZ 5264 =0 . 00W# 3 On 2 32 0,007 73 0.00607 
73 5320 =O OG 2 0202703 —U no 07 10 0.00608 
94 313) 16 -0.00714 Of 02662 =0.200778 0.00602 
25 5432 =O 7.86 0.02626 -0.00743 0. 00S 78 
96 5488 sel Wic LONG S0H(0) We O263)2 =O eOUy ot O0057 7 
a7 5544 =O 07 O19 0.02584 ~0.00804 9200 569 
98 5600 =O OO 2S 0.02541 -0.00704 0.00554 
39 S056 =O OWio7 & 0.02512 -0.00798 0.00549 
100 SH iz = OO OS tn U0 72535 -0.00744 0.00555 
Table V. (Continued) Estimated bias and variance of the 
improved stochastic approximation estimator for the 0.99 
quantile of the exponential distribution. Algorithm of 


Section III.D.2 and maximum transform (v = 56) were used. 


131 





Non- epee enc 


tochas 


t 


uantile Estimation Through 
1c Approximation 


X Sampie 
n Size ies | o2{s*'"] fC(st'] oz(st'] 
n n n n 
101 5768 -~0.00796 OF 02534 =O OO 7 U7] 0.00543 
102 5824 ~0.00758 OO2he 7 (0) 5 (ONO TEU 0.00521 
103 5880 ~0.00579 0.02421 ~0200699 0.00513 
104 5936 -O.0 06602 0.02404 -~0.00718 0005 12 
105 518) $7 ~0.00628 0.02405 =e 10756 S200S 42 
106 6048 ~O,00 7o4 0.02367 -0.00761 0.00501 
107 6 104 -~0.00690 Ce 0Z2326 -O2 00692 0.00490 
108 6 160 -0.00764 02023 28 -~0.00748 0.004 86 
Os 6216 OO) O17 0.02296 SOI 2 a 0.00481 
110 6272 -0.00776 0.02271 Or 10m ORO Jon 
111 6328 =e 007 23 0202263 =O. 006 98 0.00471 
112 6384 -0.00856 0702 198 ~0.00695 0.00450 
113 6440 =02007 19 O22 12 -0.00710 0.00459 
114 6496 ~0.0079% 05 022.00 SG OUST, 0.00444 
115 6552 -0.00691 0.02140 -~0.00681 0.06443 
116 6608 -~0.00687 OL 02477 =O. 007s) 1 ORO OU 39 
ee? 6664 -0.00690 0.02124 ~0.00726 0.00429 
118 6720 -~0.00634 O02 406 ~0.00690 O00 22 
119 6776 -~0.00795 0.02066 -0.00726 0.00417 
120 6 832 -~0.00711 Or07 08:2 =O COT 2% 0.00415 
121 6888 OU 1G 002053 ~0.00653 0.00414 
22 6944 -~0.00684 OZ OZ069 -0.00652 0.00409 
23 7000 ~0.00696 0202058 =0200723 0.00404 
124 7056 ~0.00644 0.02007 ~0.00670 O70 03:57 
125 a 2 -~0. 00606 0.02019 SOOO 125 ORO s 3 
Table V. (Continued) Estimated bias and variance of the 


eles Mal 25S, 
Algorithm of 


improved stochastic approximation estimator for 
guantile of the 


Section L[11.D.2 


exponential distribution. 


and maximum transform (v = 56) were used. 


ie 





Non- Pecans 


tochas 


sti 


Yantile Estimation Through 
1c Approximation 


X Sample 
n Size PLs**#} 9 G2(s*"] flst']  o2[st"] 
n in n n 
126 7168 NTFS RO SONS) 0.01991 -0.00641 O20 0387 
a27 7224 ~0.00779 0.01968 =OeOO 7 IZ 0.00383 
128 7280 0.00720) O02 34 ar Oboe GZ003 7] 
129 7336 -0.00660 Oe SO OGIO a T0038 75 
130 7392 15) US 2h 7) O01929 -0.00713 0.00370 
131 T4H48 Oe 00 754 Oe Oreo -~0.00649 VeO0 S69 
132 7504 -0.00681 0.01903 0) 5 CIOS te, 0.00360 
133 7560 O00 G5 4 0.01885 =Om0 722 02003162 
134 76 16 Ore 0 OSiSio 0.01874 =OmoCo sd 0.00361 
135 Ge 2 -~0.00651 0.01842 -~0.00614 000353 
136 4/28 ~0.00601 0.01831 -0.00640 0.00347 
is 7 7784 SO ViooG 0.01829 = On UGGS 0.00345 
ays 7840 -0.00646 0.01795 0) LONG TS a's) D003 39 
139 7896 =O, 001630 0.01793 -0. 00693 0.00336 
140 a2 ~0.00634 0.01762 -0.00634 0.00328 
141 8008 -0.00608 0.01773 0% 0063 1 0200333 
142 8064 ~0.00630 0201760 -0.00618 Oe 00326 
143 8120 Oe CUO a 7 0.01724 -0.00640 O:.00.3 23 
144 8176 -0.00701 0.01737 =Ji010'635 V2 003 22 
145 8232 0 O00 702 0.01702 1) 5 eee sis: Or00 3-12 
146 8288 OS ON) Fi GUS O70:17-06 = Os 0: 0iothO 02003 13 
147 8344 0 0055? 0.01695 —On.0 Gic0 2 0.00307 
148 8400 -0.00637 0.01680 =Os00621 0.00315 
149 8456 = renin 0.01704 -0.00610 0.00310 
150 oS 12 -0. 00607 0.01668 =O Ua 7 > 0.00304 
Table V. (Continued) Estimated bias and variance of the 
improved stochastic approximation estimator for the 0.99 
quantile of the exponential distribution. Algorithm of 


Section III.D.2 and maximum transform (v = 56) were used. 
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Figure 15. Expected bias of the stochastic approximation 


estimator Ss for the 0.3939 quantile of the 2xponential 
n 


distribution (Y-axis) vs. step number n (X-axis). 
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Non- PS NS Sa Juantile Estimation Through 
tochastic Approximation 


1. Order of the bias 


We now proceed to direct consideration of the bias 
estimates in Table V; the control variate estimates of the 
bias are plotted in Figure 15 where it may be seen that 
there is a definite decreasing trend. The rate of decrease 
appears to be very slow, however; furthermore, there are 
marked irregularities in the first few steps. Since we are 
for the most part interested in the large sample behavior of 
the stochastic approximation quantile estimators we suppress 
the initial instability by including in the regression only 
the estimate values from steps greater than 50 (1.e., X 


Samples larger than 2912). 


Carrying out a linear regression using the model (5) 


results in the estimates 


Eo = 0.00264 + 0.00174, 
(22) E = -0.14103 + 0.03330, 
F. = 0.39692 + 0.15633; 


the second figure given is the standard deviation of tne 
estimate. Assuming that the errors in (5) are approximately 
normally distributed, we compute the following analysis of 


Wartiance table: 
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Non-parametric Quantile Estimation Through 
Stochastic Approximation 


source Sum of Squares Degrees of Mean Square 
Freedon 
Constant Zi, oe. 10983 1 2%1,985.1098 
cr (0) Moe 300) 1 ales 3) 
mii), 4 (2) PSEA 8) 1 Zo Oe O 
Regression 30.30378 2 156-1519 
Explained Zep0 ioe 136) 3 T, 338.4712 
Pure Error 2,245,341.29482 Us 9,099 Os 5550 
ietck Of Fit &7.68185 98 0.4865 
Residual Dae eo, Sob. 91627 089 499 Oe S50 
Total Zr2o7 400. 38989 4,040,000 0.5612 


The regression 1S significant aS measured by the 
F-ratio of 27.2618 which 1s significant at the 0.999 level. 
The ratio of the sum of the squared deviations about the 
regression -line ("pure error} to the squared deviations 
between the fitted and mean biases ("lack of fit") is 0.8754 
which is not Significant at the 0.9 level; we thus conclude 
that the fitted line adequately describes the data of Table 
V. Note that our hypothesis that cfc = 0 is certainly 


consistent with these results although the F-ratio of 22.568 
Will not allow us to reject the a term as not Significant 


in the regression. 


One problem encountered in most of the regressions 


Sarrzed out on this data is the high degree of 


T 
multicollinearity in the GG matrix when more than just a 


= =. ; , 
few terms of the form g _(n) = nh clea are included in the 


model. The result of this multicollinearity is considerable 
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variability in the fr. estimates as measured by the standard 


errors as well as some irregularities in the analysis of 


worrance. This 1S one reason that so much data had to be 


accumulated for this experiment. 


Discriminating between the model (5) and a model such 


as 


-174 Bi sz ~3/4 
oe tran ir Hh 


23 E[s*] = 
(23) [s*] SS ; 


also requires a great many observations on s*. The results 
n 


of a regression using (23) are 


os = 0.03821 + 0.02146, 
r, = ~0.34324 + 0.13249, 
F_ = 0.46642 + 0.20350. 


The e coefficient estimate is thus just Significant at the 


C29 level while Es is significant at the 0.99 level. An 


-174 
analysis of variance indicates that the n tern 


contributes 1.763 to the regression sum of squares while 


the other two terms contribute 28.565. Aithough neither an 
F-test nor a t-test will allow us to reject the low order 
term as not statistically significant this regression 


provides convincing evidence that the order of the bias is 


Sz, 





Non-parametric Quantile Estimation Through 
SeeecnactelLe APOrOximat10n 


4 


= 2 
in fact n aS indicated by the theory. This is certainly 


a considerable improvement over the results of Yuguchi [38]. 


A regresSion was also carried out using the model 


(24) (Ef s* J] = co -. 
n 1 


to attempt a direct verification of the order of the bias. 
(24) can be handled as a linear model by using a logarithmic 
transform on the data. It is apparent from Figure 14 that 


higher order terms have an important effect on the bias; 


ie 

therefore a power series in n Was also fitted using 
nonlinear regression. Unfortunately, the results rere too 
unstable to be of much use. To minimize the effect of 


higher order terms, then, we include in the regression only 


data from the later steps. Yhe resulting estimates are: 


Lowest Step 


in Regression “4 : 
1 0.00912  -0.04258 + 0.03146 
20 C2070 02 =Oeeteze +t 0.013974 
50 0.04305 wOeese227) t 0501562 
100 E05 5°25 ~0.46887 + 0.04677 


Based on these results we conclude that the data of 


mage 
Table V display a definite n trend and that the evidence 
does not seem to warrant the assumption of a lower order of 


bias. 


One way to explore more fully the effect of the initial 


Starting point on the bias of the stochastic approximation 
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quantile estimator is to begin the procedure with S fixed 


at some value of interest instead of using random values as 


an Table V. This has been done for values of §$ between 0 
1 


and 9 (corresponding to initial hiases from -4.5 to 4.5). 
We then carry out a regression uSing the model 


all 2 = I 
EL s*}] = rin Wa See Ge 
n Z 


the resulting estimates =. are plotted in Figure 16 and 


summarized in Table VI. 


We conclude from Figure 16 that the bias of the initial 
estimate plays a significant role in determining the 
asymptotic bias of the stochastic approximation quantile 
estimator. This is in general agreement with the results of 
Hodges and Lehmann [15]; although the relationship of Figure 
16 is clearly not linear, the asymptotic bias apparently 


increases with increasing deviations in ar There is 


insufficient data here to investigate the relationship more 


fully, but the guadratic fit 
r = ~0.112 + 0.023 a - 0.004 Se 


plotted in Figure 16 seems to describe the data fairly well. 
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D m Standard 
4 “4 ECLOL 
0.0 ~0.09672 0.00652 
0.5 ~0.12741 0.01783 
1.0 -0.07937 0.01741 
ie 5 Oe 2 0.01699 
2.0 -~0. 11091 0.901641 
265 -0.10804 0.91582 
3... 0 -0.09765 0.01508 
3.5 -0.09152 0.01460 
4.0 ~0.10550 0.01430 
G.5 -0.074805 0.01406 
4.605 ~-0.07898 0.00503 
10 =O 215 0.01432 
os 5 0 2 O10) 0.01694 
6.0 -0.09135 0.01602 
6.5 -0.09890 6.01694 
7.0 =O 05 544 O20 1729 
5 ~0.16990 0.01883 
8.0 -~0.19678 0.01981 
es, =O. 19846 0.02061 
9.0 =0. 26129 0.02151 

= VL 
Table VI. Estimated coefficients for the O(n jee cern sean 


the bias of the stochastic approximation estimator for the 


0.99 quantile of the exponential distribution as a function 


of the initial starting point, sae Estimated by linear 


regressions which included 1000 replications of steps 50 to 


150 of the stochastic approximation process. 
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Figure 16. Estimated coefficient of the n term in the 


bias of the stochastic approximation estimator for the 0.99 


quantile of the exponential distribution (Y-axis) vs. the 
PeetoeOr the initial starting point § . fhe vertical lines 
1 


represent two estimated standard deviations about the 


estimated coefficients. 
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2. Comparison with order statistics 


| -1/2 
The presence of the nn bias term puts stochastic 


approximation quantile estimators at a disadvantage when 


compared with order statistic estimators whose bias is 
O(n~!). The data of Table V, however, indicate that the 
stochastic approximation biases are quite small as compared 
with the estimator variance. The net effect of the bias, 
then, will be to inflate the asymptotic mean squared error 


Slightly. Based on (1) and (22) we have 


Ge 
MSE([S'] = Var{S']) + _1 + O(n?) 
nN n 


--> 2.361 + 0.020 


a 


which should be compared with the order statistic case: 


MSE[ 1 = Vari hes 


S } + o(n7t) 
(n Vv 

768 . 

n -— 


eB 

-~-> US 
(Recall that the order statistic estimator will be based on 
the entire X sample and not just on the section maxima.) 
Most of the asymptotic difference between the two quantile 
estimators is thus due to the variance inflation (1.15) 


which accompanies the use of the maximum transform. 
A comparison between finite sample order statistic and 


stochastic approximation quantile estimators is presented in 


Table VII and plotted in Figures 17 and 18; Figure 1/7 
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Non-parametric 
Stochas 


X Sample 
n Size 
1 168 
2 224 
3 280 
uy 336 
5 392 
6 4ug 
7 504 
8 560 
5 6 16 
10 GZ 
11 Uae 
12 784 
13 840 
14 896 
15 S152 
16 1008 
7 1064 
18 1120 
19 1176 
20 23 2 
21 1288 
ISD 1344 
23 1400 
24 1456 
25 15) 
Hoole Vil. 
approximation 


Bias 


~0.11274 
~0.04338 
~0.01847 
~0.01015 
-~0.00719 
-0.00708 
~0.00625 
Ue Ol Ola 
-0.00124 
~0.00450 
~0.00742 
~0.00506 
-0.00840 
=O Wo 24 
soll OPO OT ORS 
-0.00604 
=Ue0058 54 
=0200045 
-0.00760 
7Q0.00777 
-~0.00887 
=U 00372 
=O 0 954 
= Ores 2 
-0.00744 


Comparison of order 


estimators for the 


exponential distribution. 


Stochastic Approximation 


MSE 


0.66383 
Vo 2 35/6 
0.45473 
0. 39043 
0.34227 
Ona05 79 
0.27741 
Rez 1076 
Oe 25595 
5 Asie 
OSV te, 
0.18676 
Oiere 659 
0.16650 
0.15750 
0.14920 
0.14260 
0.13449 
0.12730 
Oma 245 
Ore 1 less 
0.11217 
Oe a0g 20 
0.710294 
0.10049 
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Order Statistic 


Blas 


0.09898 
~0.11408 
Ve euco Z 
~Q.04269 
0.11125 
COs. 
=O. 00) 2 
0.01754 
= OG 055 910 
0.03305 
Ole Wes isi 2 
0.04423 
So OU ASH 
0.05269 
0.00217 
~0.04070 
0.01334 
OOo © 
0.02247 
=O ena 1 
0.03007 
~0.00431 
~0.03493 
0.00427 
-~0.02466 


statistic and 


quantile 


NSE 


0.64880 
0.40349 
0.40317 
0.28267 
ee Slain, 
OZ a2 
0.18704 
Oa17985 
0.15483 
Oe oS) 15; 
0.43265 
Onlesiorze 
0.11646 
Oe ony 
0.10412 
OPUS Seis 
0.09440 
0.08670 
0.08656 
O07 525 
0.08009 
O0733-2 
0.06944 
0.06027 
0.06444 


stochastic 


of the 





X Sample 
n Size Bias 
26 1568 -~0.00936 
Og | 1624 ~0.00894 
Zo 1680 10) 5 On ONeh es: 
no 1736 -0.00880 
30 92 0) 0) 2 chs 
BA 1848 -~0.01007 
32 1904 —020093 1 
33 1960 -0.003981 © 
34 2016 =O BO) 
35 202 =) SOO 23 
36 2128 Sill) OOO ES: 
37 2184 -~0.00905 
38 2240 -Q.00806 
39 2296 = Or 7 
40 2352 Sl) 6 OSs) 
4 2408 SW Oise YA 
42 2464 0) 5 aie 
43 PAS DY, -9 00964 
Gy 2576 =O UiOigaZ 
45 2632 sO) ISN TF 
46 2688 = ene Gi 2 
47 2744 -~0.00901 
48 2800 =O O02 
4g 2856 -~0.01001 
50 ZA, ~0.00960 
Table VII. (Continued) 
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Stochas 


Stochastic Approximation 


MSE 


0.09479 
0.09216 
0.08811 
0.08676 
0.08387 
0. 08050 
0.07875 
0.07598 
0.07504 
1,07 115 
0.06954 
0.06854 
0.06668 
0.06472 
0.06286 
0.06108 
0.06053 
0.05894 
0.05791 
0.05555 
0.05441 
0.05370 
0.05324 
0.05175 
0.05062 


Comparison of 


order 


Order Statistic 


Bias 


0.01169 
= On uuay s 
Oe C116 
=0.007308 
02.02 3810 
~0.00093 
in 21 2 
OT 00S 26 
Oe oo 
0.01082 
-0.01014 
0.01583 
=—O7 00431 
C0202 7 
0.00099 
O50 da 15 
0.00583 
illo al 
Q.01027 
=O Oia S 
0.01436 
ORO 206 
=04 O57 
OS GUZ 23 
-0.01284 


MSE 


UE Vibe 39 
OU G0Z2 
0.06032 
0.05661 
0.05714 
O02 50 
0.05131 
0.05079 
0.04855 
0.04841 
0.04614 
0.04630 
0.04401 
0.04442 
0.04212 
0.04069 
0.040484 
0203895 
0203672 
0.03740 
020857 
0202600 
0.03504 
0.03474 
O08 Sez 


statistic and 


Stochastic approximation estimators for the 0.99 quantile of 


the exponential distribution. 
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Non-parametric 


X Sample 
n Size 
51 2968 
52 3024 
53 3080 
54 See 
55 3192 
56 3248 
SM 3304 
58 3350 
59 3416 
60 3472 
61 3528 
62 3584 
63 3640 
64 3696 
65 BE) sy 
66 3808 
67 3864 
68 3920 
69 3976 
70 4032 
71 4088 
72 4144 
a3 4200 
74 &256 
ies) 4312 
Table VII. 


(Continued) 


Bias 


-0.00994 
-0.00934 
-0.00798 
-0.00956 
~0.00910 
~0.00971 
-0.00924 
~0.00992 
-0.00907 
~0.00854 
-0.00875 
-0.00818 
-0.00854 
-0.00831 
-0.00856 
0.00863 
-0.00882 
-0.00829 
-0.00840 
-0.00906 
~0.00916 
0 0G 
-0.00845 
~0.00848 
-0.00941 


Stochastic Approximation 


MSE 


0.04982 
0.04861 
0.04740 
0.04662 
0.04579 
0.04522 
0.04462 
0.04371 
0.04265 
0.04249 
0.04168 
0.04060 
0.04046 
0203918 
O20S877 
0.03803 
0.03756 
Ca0so7| 
0.03673 
0.03601 
0.03516 
9.03483 
0.03456 
Cav oeet 
SUNS PS) 8) 8. 


Comparison of 


order 


uantile Estimation Through 
Stochastic Approximation 


Order Statistic 


Bias 


0.00620 
-0.00844 
0.00991 
-0. 00434 
0.01336 
~0.00050 
~0.01371 
0.00309 
-0.00979 
0.00647 
~0.00611 
0.00964 
-0.00264 
0.01263 
0.00064 
-0.01087 
0.00373 
-0. 00752 
0.00666 
-~0.00436 
0.00944 
-0.00135 
-0.01174 
0.00151 
-0,00868 


MSE 


0.03360 
O03 257 
0.03256 
0.03144 
0.03161 
0.03046 
CEG 28) 7) 8! 
0.02956 
0202879 
0.02874 
U2 027 22 
GU, Ss 
C2027 is 
0.02728 
0.02640 
G202555 
We 5 7) 
OS 77 
Omi2 in 
0.02446 
0.02453 
CEU 236.6 
0.02343 
0202250 
OR 02Z255 


statistic and 


stochastic approximation estimators for the 0.99 quantile of 


the exponentiai distribution. 
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X Sample Stochastic Approximation Order Statistic 
n Size Bias Sse Blas MiSs 
76 4 368 ld oo Oe 03338 0.00422 0.02278 
uy Q4u2u ood ON OT 0.03307 -0.00577 C20 2228 
78 4480 -0.00828 O03 2 16 0.00681 G2 027229 
79 4536 -~0.00756 OS08 172 0) 010 28 O20 Z177 
80 4592 S10) GI Sis 0.03089 O00 49278 O70 2485 
8 1 4648 “0.00816 ©.03098 OOO st 0.02129 
BZ 4704 ~0.00777 O20 3094 -0.00964 0.02093 
83 4760 -~0.00744 0.03088 0.00279 O02 Us 
84 4816 ~0.00807 O02 3 '6 = 00695 0.02046 
85 4872 SO OI 0.02940 0.00464 0.02043 
86 4928 ~0.00782 0.02913 =) QWs 7 0.02002 
87 4984 10) 1008 Ass. 0.02905 0.00693 A200 5 
88 5040 = enue 36 Cuz eoT ~0.00190 O201gGa 
89 S096 =0. 00823 0.02851 COG oA 0.01969 
90 5152 -0.00736 0.02817 0.00047 0.071922 
91 5208 =O 107 0b 0.02749 =O, 00795 0.07892 
92 5264 -0.00773 O22 736 Oe 00270 0.01886 
ons 2320 -0.00710 0.02708 ~0.00554 O20 1853 
94 5376 =O Oso 0.02668 0.00493 0.01853 
95 5432 -Q0.00743 O07 Gs 2 10) ONG) 2S) 0.01817 
96 5488 =O Om On On osd 023007703 0.01822 
ay 5544 -0.00804 0202594 =O200 1011 0.01784 
98 5600 -0.00704 0.02546 -0. 0088 1 O70 176-0 
By 5656 -0.00798 0.02518 0.00114 020 igo2 
100 Sz -0.90744 0.025490 -0.00656 0.01725 
Table VII. (Continued) Comparison of order statistic and 


stochastic approximation estimators for the 0.99 quantile of 


the exponential distribution. 
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X Sample Stochastic Approximation Order Statistic 
n Size Bias MSE Bias MSE 
101 5768 -0.00747 0.02540 0.00320 O02 Ony23 
102 5824 -0.00747 0.02433 =O Oe 0.01694 
103 5880 0 SOD SS, 0.02426 200549 0201695 
104 ag30 S19 GO) FS: 0.02410 Or ceo 0.01664 
105 592 =O 200 756 0.02411 0.00711 0.01669 
106 6048 =Os0076 1 OE 3 73 =n 26 0.01636 
107 6104 SO OG 9 Z Onn 25510 -~0.00744 0.01615 
108 6160 =O coc O70 2333 0.00169 0.01610 
109 6216 0007 29 0.02301 Mii 10) Soh) 020 2e7 
hie 6272 = Orel 0 0.02276 0.00358 0.01585 
111 6328 105 (OU Sits: Oe 2 265 ~0.003480 O-Olace 
v2 6 384 =O 0 O25 0.02198 0.00541 0.0 Tac 
113 6440 -0.00710 OR 0rZ 201} sO O nds 03 055 
114 6496 =a a7 0202205 0200717 0.01540 
115 Goa2 19) Ue s) 0.02144 0.00037 0.071511 
116 6608 res | OR0Z 132 —UeOUO27 0.917493 
117 6664 OO O72 CeO g 0.00217 O20 Tied 
118 6720 =O 00 G90 O20 2411 = 00439 0.01469 
ao 6776 ONG F/ Ais 0.02071 0.00391 0.01468 
120 6 832 OO Oat OFC 2037 =O2002 57 0.01446 
iz 1 6 888 =10) (ters 3 O20 2057 C0036 0 0.071448 
122 6944 -0.00652 0.02073 ~0.00080 0.01424 
23 7000 Omi m23 ORO 20G3 =e i007 OS 0.01409 
124 7056 -0.00670 OPAC ie 020009 1 0.017404 
12s 72 -0.00725 0.02024 =). 00527 0201387 
Table VII. (Continued) Comparison of order statistic and 


stochastic approximation estimators for the 0.99 quantile of 


the exponential distribution. 
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X Sample Stochastic Approximation Order Statistic 
n Size Bias MSE Bias MSE 
126 7168 -0.00641 0.01995 0.00258 0.01385 
127 7224 -0.00712 0.01973 -0.00353 0.01367 
128 7280 ~0.00695 0.01938 0.00419 0.01367 
129 7336 ~0.00661 0.01927 -0.00185 0.01347 
130 7392 -0.00713 0.01934 0.00576 0.01350 
131 7T4U8 -0.00649 0.01895 ~0.00021 0.01329 
132 7504 -0.00679 0.01907 ~0.00605 0.01315 
133 7560 -0.00722 0.01890 0.00138 0.01311 
134 7616 -0.00672 .0.01879 -0.00440 0.01296 
135 1672 -0.00614 0.01846 0.00293 0.01295 
136 728 -0.00640 0.01835 -0.00278 0.01278 
137 7784 -0.00663 0.01834 0.00443 0.01279 
138 7840 -0.00535 0.01800 =0.00122 0.01261 
139 7896 -0.00693 0.01798 0.00590 0.01265 
140 7952 -0.00634 0.01766 0.00031 0.01245 
141 8008 COO GsT OO 1777 -0.00518 0.01232 
142 8064 ~0.00618 0.01765 0.00179 0.01230 
143 8120 -0.00640 9.01728 -0.00363 0.01216 
144 8176 -0.00635 0.01741 0.00324 0.01216 
145 8232 ~0.00638 0.01706 =0.00213 0.01200 
146 8288 -0.00640 0.01710 0.90465 0.01202 
147 8344 -0.00602 0.01699 -0.00066 0.01186 
148 84.00 -0.00621 0.91684 -0.00588 0.01175 
149 8456 -0.00610 0.01707 0.00075 0.01172 
150 8512 -0.00575 0.01671 -0.00440 0.01160 
Table VII. (Continued) Comparison of order statistic and 


stochastic approximation estimators for the 0.99 quantile of 


the exponential distribution. 
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Sie al | 
I a 
ew 53 


Bmagure 17. Mean squared error of the order statistic 








estimator (lower curve) and the stochastic approximation 
estimator (upper curve) for the 0.95 quantile of the 
exponential distribution vs. the number of stochastic 


approximation steps. 
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Stoenas 





Figure 18. Bias of the order statistic estimator for the 


0.99 quantile of the exponential CYetrE UDG Ons eee same 
horizontal scale as in Figure 17 is used. 
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displays the mean squared errors of the two estimators while 
mune 18 is a plot of the bias of the order statistic 
estimator. Values mow the Stochastic approximation 
estimator were obtained from the simulation data of Table V 
while the order statistic values were computed from the 
formulas for the exponential distribution (see David [5)) 


n 
wt 


Els ] = i-t , 
n 


i=n2u+1 


A n ; 
Var{s J =. 2 Te 
n 1=n-utt 


where u = [{a(n+1) }. 


The characteristic jagged appearance of Figure 18 
fmeectS the truncation inherent in calculating uw; it also 
makes direct comparison of bias terms digGicmic. 
Nevertheless it 1s clear that the stochastic approximation 
estimators are generally less biased than the corresponding 
order statistic estimators for X samples smaller than 3500 
observations while the biases are roughly the same for 
samples of from 3500 to 5500 observations. For larger 
Samples, the asymptotic advantage of the order statistic 
estimators begins to assert itself and we find that the 
Stochastic approximation estimators are for the most part 
more biased. The mean sguared error plot (Figure 17) merely 
confirms the asymptotic Superiority of the order statistic 
estimator in terms of variance. Note that even when the 
stochastic approximation estimator is more biased this does 


not seem to have much influence on the nean squared Error. 


In practice the approximate order statistic estimators 
of Section III.A are often used in order to conserve 
computer memory; the problem with these techniques 1s that 
they may introduce an objectionable bias into the estimates 


(see Tables II and III). If stochastic approximation 
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quantile estimators were used in an approximate design, 
however, Table VI shows that for sample sizes small enough 
to be practical for the order statistic estimators the bias 
will be smaller for the stochastic approximation case, 
although the variance will be greater. This trading of bias 
for variance is also seen when the jackknife ({27], [{38]}} is 


applied to the order statistic estimators. 


Of course there 1S no need to carry out ae section 
averaging or nesting procedure with stochastic approximation 
quantile estimators; this 1s necessitated in the order 
statistic case because the requirement to store and sort an 
entire section imposes an upper limit on permissible section 
size. The fixed Memory size for the stochastic 
' approximation estimator, however, means that we may reduce 
both bias and variance by considering larger X samples 
directly without sectioning the data. In a practical sense, 
then, the stochastic approximation estimates are less biased 
than the corresponding order statistic estimators for very 


large data samples. 


E. Higher Moments and Distribution of § 
n 


4 * 
Besides the significant O(n ) term in the bias, 


another disturbing result of Yuguchi's thesis [38] was the 


apparent increase in the coefficients of skewness and 

kurtosis of S' with increasing values of n. The coefficient 
n 

of skewness of a random variable X is 


Pes = 7 * 


1 sy 
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a 


where pp = E{X] and oa = Var[X]}. Ye is zero for any 
Symmetric random variable, e.g. normal. The coefficient of 
kurtosis (sometimes called excess kurtosis) we define as 


ee OS Ot ae 


Vs 1s also zero for a normal random variable. 


If S* converges weakly (i.e. in distribution) to a 
n 
normal random variable it is desirable from a practical 
point of view for aS ) and Ee IE poth seo approach Zero 
n n 


aS n increases. Of course, weak convergence {or even almost 


Sure convergence) does not imply convergence in pth mean, 


beet, SO that - and ie need not even approach a finite 
limit; an example is provided by Figures 2 and 3 where the 


RM estimator converges in guadratic mean and in distribution 


but apparently not in third or fourth means. 


This problem does not occur for the new estimator, 
however. The sample means, variances and coefficients of 
Skewness and kurtosis are tabulated in Table VIII for 
one-fourth of the data from Table Von i.e. LG G00 


independent replications of s**' for the exponential 0.99 
n 


quantile using the maximum transform with v = 56. The third 


and fourth central moments were not obtained for the 
remaining 30,000 observations for each n value in Table VI 


in order to save computer time; the data that was collected 
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X Sample 
n Size Mean Variance Skewness Kurtosis 
1 168 -0.%71274 OE Go tar2 08 VAS Sle O233864 
2 224 -~ 0.05081 0.54749 0.62491 os See 
3 280 10) 0 ass 0.45878 0.46303 0.68834 
4 B36 = Oo On. 0.38981 0.49609 0.61954 
5 392 ~0.01409 0.34421 0.44722 On 0377 
6 443 —0. OO hy 0.30619 0.46074 0.48721 
504 -~0.00101 0.28681 0.46856 0.54782 
8 560 ~0.00628 Ga25 anu 0.44417 Ooo 1S hs 
9 616 0 0 Omni Or 3255 Oma Se9Z 0.33607 
10 672 ~0.00347 0.21493 0.44614 0.50744 
11 728 ~0.00430 Oa 20665 0.45871 Oaslis4 
12 784 0.00011 0.18656 C234 082 0.24865 
13 840 70200320 Oe 17232 0.38481 0.45498 
14 896 -0.00479 0. 16871 Ozsb254 0.36680 
15 252 ~0.00115 C2753 53 C237 0c3 0.61999 
16 1008 -0.00776 0.714934 Os 6es 7 0.45296 
17 1064 0 0 10 ys 0.13933 0226925 0.26946 
18 1120 ~0.00710 0. 13890 OR SS siait OG | 
ie 1176 =O CON. 7 0.12758 0.33897 Oeos 789 
20 1 BN Onno OM 6 Un tz ou Os 33085 0.42341 
21 1288 -~0.00948 0.11434 0.33706 Oe327 02 
22 1344 -0.00987 0.11342 0.35054 0.36447 
Z3 1400 —OmOne 0.10640 0.31868 0.31489 
24 1456 =O 0 Cong 0.10344 O229 160 0.27693 
25 12 —03 0027'S Oe 03892 0223703 0.40463 


Table VIII. Sample moments for 10,000 realizations of the 
stochastic approximation guantile estimator for the 0.99 


quantile of the exponential distribution. 





X Sample 
n Size 
26 1568 
27] 1624 
28 1680 
29 1736 
30 1792 
31 1848 
a2 1904 
33 1960 
34 2016 
35 2072 
36 2128 
37 2184 
38 2240 
a9 22276 
40 Za Z 
uf 2408 
42 2464 
4 3 2520 
4y 2576 
45 Zoe 
46 2688 
47 2744 
4g 2800 
49 Zeb 
50 ZZ 
Table Was 
realizations 
estimator 


distribution. 


< 


Non-parametric 


Stochas 


yantile Estimation Through 


2C APDroximation 


Mean Variance Skewness 
~0.00785 0.094142 0.31614 
~0.01234 0.09181 0.27344 
-0.01041 0.08664 0.24567 
-0.00649 0.08490 0.25753 
-~0.00879 0.08245 0.21642 
-0.00996 0.08109 0.28204 
~0.00900 0.08013 0.25671 
-0.00595 0.07646 04 ESE 
~0.00698 0.07306 0.23054 
-0.01030 0.07136 0.28635 
-~0.00765 0.06857 0.19870 
~0.01157 0.06731 0.28060 
-0.00767 0.06548 0.27972 
-0.00703 0.06339 0.24303 
-~0.00493 0.06313 0. 24346 
~0.00963 0.06112 0. 24595 
~0.00833 0.06057 Oe 275 
-0.01114% 0.05890 0.22788 
~0.01300 0.05687 0.29007 
-0.00895 0.05617 0.30989 
~0.01024 0.05525 0.25985 
-0.00915 0.05378 0.25192 
~0.00799 0.05225 0.22194 
~0.00881 0.05202 0.28420 
~0.00671 0.05032 0.24716 

(Continued) Sample moments 


the stochastic 
the 0.99 


155 


approximation 


guantile of the 


Kurtosis 


0.62028 
0.32623 
0.39986 
0.15673 
0.15205 
Genoese 
0.20375 
0.18382 
0.11701 
0.31507 
0.08330 
0.27422 
0.26658 
0.14686 
0.16414 
0.19981 
0. 23653 
O- Dowlays 
0.49698 
0.37786 
0.21686 
0.18870 
0.29575 
0.51976 
0.12429 


10,000 


quantile 


exponential 





Non-parametric Quantile Estimation Through 
1c Approximation : 


Stochas 


& 


X Sample 
n Size Mean Variance Skewness Kurtosis 
om | 2968 =O0 1012 0.04907 0.24310 C2257) 30 
52 3024 ~0.01059 0.04704 OR 2359 0.15058 
53 3080 ~0.00681 0.08792 224635 O22 low 
54 3136 -~0.00419 0.04660 On2 1263 O.11276 
je Se ~0.01063 0.04529 0.18625 0.24244 
56 3248 -0.00697 0.04626 On2soof 0213575 
57 3304 ~0.00725 0.04409 ORNS i245 0212925 
_ 58 3360 ~0.00981 0.04296 0.20815 0.11342 
Sy) 3416 =-010 733 0.04314 O75 23651 0.13994 
60 3472 ~0.07049 0.04116 O2.25520 Us3 500 5 
61 5S 2R5) 0 00903 0.04072 0.19669 0.06457 
G2 3584 ~0.00984 0.04113 Oe 12939 OF 15193 
63 3640 ~0.00832 0.03954 0.18986 Oval si 5 
64 3696 ~0.00730 0.03929 0.20019 0.19601 
65 BZ Un OO ou 7 0.03800 0.15993 Os 12 516 
66 3808 -~0.00937 0.03806 0.17441 0303520 
67 3864 =0.00703 005.837 Oe Sa 023s 3 
68 3920 -~0.00721 0.03669 O27 15186 0.14438 
69 3976 -0.00788 0.03624 OR OSGeo -0.04834 
70 4032 -0.00718 OE 0s655 Oe 20713 OS Faw? 
71 4088 ~0.00822 O03 5s0 Ome2q447 On eas 
72 G144 “0.00846 203835 0.19317 Oeaho2 11 
73 4200 1015 ONS 7/ 0.03501 OF lomo 0.07293 
74 4256 -~0.00830 0703363 0. 16786 0.02438 
75 4312 -0.01061 0.03401 0.19544 0.11400 
Table VIII. (Continued) Sample moments fOr 10,000 
realizations of the stochastic approximation quantile 


estimator for the 0.99 Giantaskess of the exponential 


agustribution. 
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n 


76 
77 
78 
719 
80 
81 
82 
8 3 
84 
85 
86 
87 
88 
89 
20 
91 
2 
26 
94 
95 
96 
o7 
Shs. 
a, 
100 


Table 


realizatiois 


Non-parametric 


X Sanrple 


Size 


4368 
4424 
4480 
4536 
4592 
4648 
G704 
4760 
4816 
4872 
4928 
4984 
5040 
5096 
2152 
5208 
5264 
5320 
313) 146 
59432 
5488 
5544 
9600 
5656 
5712 


Vill. 


estimator 


distribution. 


Stochas 


Mean Variance 
-0 .00740 203339 
—0.00G50 OF 052-30 
=~ On OUas 7 0.03264 
Se Oren Oia te? 05:03 185 
~0.00781 0203 163 
~0.01054 O703183 
= 010 9o 3 Oa0s0 32 
= 10 0d S55 0202269 
0.00731 On 02907 
-0,00968 Orr 02 2e0 
-~0.01065 0.02851 
Orme 3 O02 597 
-0.00924 0.02800 
sO O00 709 0.02854 
lc UN Se 0.02816 
9) OU SO Va02795 
~0.00654 0.02701 
=O 0GE 6 0.02680 
~0.0084 3 0.602677 
=O 2 OOo O2G2617 
-~0.00751 OBO2623 
~0 .00658 0.02546 
-0.00448 0. 02596 
a) OO ee Or 02503 
-0.00439 0.02514 

(Continued) Sample 


fie Stochastic 


the. 0,99 


quantile 


157 


Yantile Estimation Through 
1c Approximation 


Skewness 


OF 238 92 
0.17281 
0.714122 
Oz 21379 
0.18469 
0.20911 
0.18384 
0.18287 
Crp stele) =) 
0217036 
0.17953 
0.16184 
Oe USrslos 
Orso 23 
Om tah? 5 
0.19455 
Ue oit/ eile, 
Celio 
Oe 11591 
G.14022 
0.14651 
G213555 
0.11760 
0.13814 
0.17244 


monents 
approximation 
of the 


Kurtosis 


0.40494 
0.14084 
0.00701 
0.12254 
0.08219 
0.24620 
0.18129 
0.19079 
0.164855 
0.00225 
0.11817 
0.05916 
0.11181 
0.15899 
0.20298 
0.03783 
0.09320 
0.07674 
0.02051 
6.10174 
0.18201 
0.08283 
0.07470 
0.06399 
0.14293 


10,000 


quantile 


exponential 





Non-parametric 
Stechas 


t 


uantile Estimation Through 
1C Approximation 


X Sample 

mn - Size Mean Variance Skewness 
101 5768 —O2 00564 0.02448 0.15044 
102 5824 OOO G oa 0.02504 0.15366 
103 5880 =a 0037 2 0.02453 0.15101 
104 2936 ~0.00564 OROZ 30s 0.17769 
105 SSS 10) = ONO ee Bs: 0.02428 O13 394 
106 6048 =O 90 3 0 02360 Or 03:756 
107 6104 =OOU0S2> O32 02355 0.15209 
108 6160 -0.00581 On 02 292 On 13390 
109 6216 =O Out 0.02304 Gels 107 
110 6272 =OL~008 10 0.02254 0214520 
111 6328 =0200507 0.02230 0.17181 
112 6384 -0 .00959 0.02188 0.15548 
113 6440 10) 108 0636's) Ore 02709 OS i16e2Z 
114 6496 =O 00728 O02 65 0.14474 
115 O12) BZ -0.00714 0.02170 0.14750 
176 6608 OR 00212 0.02154 Oe rh 13 
14% 6664 -0.00851 0.02154 0.13412 
1718 6720 ~0.00606 O202 113 0.15224 
119 6776 -0.00462 0.02115 0.15435 
120 6832 -0.00603 0.02020 0.13705 
121 6888 =0.00630 0.02029 0.11964 
Table VIII. (Continued) Sample  monents 
realizations of the stochastic approximation 
estimator the 0.99 guantile of the 


mestribution. 
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Kurtosis 


0.14362 
0.11828 


0.05956 


0.17094 
0.13484 
0.07061 
0.06478 
0.10493 
0.10158 
0.00261 
0.10032 
0.07619 
0.00358 
0.06447 
0.07069 
0.10814 
0.03059 
0.03194 
0.07507 


~0.00582 
=O. 06280 


107000 


guantile 


exponential 





Non-parametric Quantile Estimation Through 
Stochastic Approximation 


ie 





100 1cd 146 
Beegure 19. Coefficient of skewness of the stochastic 
PPPLOxIMation estimator for the 0.99 quantile of the 


exponential distribution vs. stochastic approxination step 


number. 
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Non-paranetric Juantile Estimation Through 
»eLOGChaSt le Approximation 





S) 29 49 69 EO iGo Leo 1.40 


Bagurce 20. Soctiveient of kurtosis of the stochastic 
approximation estimator for the 0.99 quantile of the 
exponential distribution vs. stochastic approxination step 


humber. 
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Non~parametric Quantile Estimation Through 
Stochastic Approximation 


clearly supports the conjecture that S' converges in the 
n 

fourth mean and that both , and e Caplidly approach zero. 

See Figure 19 for a plot of the skewness and Figure 20 for 


mene kurtosis. 


The generally positive kurtosis values indicate that 
confidence intervals for the mean based on the asymptotic 


normal theory will be slightly too narrow since the tails of 


Paes Gistribution of sS* will be heavier than those for the 
n 


j 
normal case; this 1s confirmed empirically in Section 


ii... The positive skewness values probably derive fron 


the shape of the distribution of the starting value oa 


which from Figure 11s markedly skewed to the right. Note 
that neither ie Nor Y, 1S great enough for X samples 


larger tena te 3000 observations to cause objectionable 


departures from normality. 


Figures 21 #.through 23 allow us to examine the 
convergence of s¥** in distribution more directly. These 
n 


histograms were computed from samples of 2500 replications 


of s*', s*t and s**t, respectively. In this case the 
a0 100 150 


replications were not independent; this enables one to gauge 
the progress of a specific {S'} sequence. The F*'ts_ plotted 
n 


on the histograms are a kernel estimate of the underlying 
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Non-parametric Quantile Estimation Through 
Seeecia stile en ppLoxlmdat1 On 


“ 


density of the s** population; such density estimates have 
n 


been found to give better insight into the nature of the 


underlying distribution than does the histogram alone. 


In general, the histograms reinforce the conjecture 


that 5' is converging rapidly to normality; in all three 
n 


cases, the density has a definitely Gaussian shape which is 


Slightly skewed to the right, the degree of skewness 
decreasing with increasing n. The sample extrema and range 


also decreaSe in a Satisfactory manner. 
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Chapter V. JOINT ESTIMATION OF A SET OF QUANTILES 


In this Chapter we address the problem of obtaining 
estimates for several different quantiles from the same X 
population based on a Single sample a This problem 
memole Of COnSiderable practical interest since one usually 
Wishes to estimate more than just a single extreme quantile 
in data analysis or Simulation studies. The problem also 
constitutes the primary area of application for the new 
stochastic approximation methods descrihed in this work; as 
long as only one quantile is to be estimated the order 
Pweaerstic technigues of Section IITI.A can be quite modest 
in terms of both computation time and memory but they are 
completely impractical when dealing with ten or more 


quantiles at a time. 


The major developnent in this Chapter is a computer 
program which is capable of providing estimates of the 
moments and quantiles of an arbitrary population given only 
sequential independent observations on the random variable. 
The total computer memory requirement (besides the code for 
the program) 1S just 150 memory cells per random variable. 
As Lewis [22] points out, there is often a requirement in 
statistical Sampling experiments or systems Simulation 
studies to collect simultaneous estimates on 309 or more 
random quantities; the FORTRAN subprogram QUANT given in the 
Appendix represents away to do this with a reasonable 
amount of memory. The subroutine could thus be used 
directly in Lewis' COMPSTAT package [22] at a considerable 


Saving in memory. 
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A. An Estimation Algorithn 


Mtiioasic approach to joint quantile estimation is to 
employ the nested design of Table I of Chapter I with, the 
algorithm for the new stochastic approximation method.given 
in Subsection IV.D.2. The main complication is that we must 
now provide a data structure to accommodate all of our set 
of estimates as well as the other information required to 


find the respective section maxima and minima. 


We assume that the population median is to be estimated 


along with the a and nee) quantiles es j=-ljpees,tOp. ~The 
quantile estimates are to be kept in array s with the median 
estimate in s{0], the oe quantile in s{27j-1] and the fe 
quantile in s{2j]. A second array f is also required; f[k] 


Will contain the density estimate corresponding to the 


quantile estimate in s[k)]. 


Each quantile estimate also requires the five values n, 
b, m, h and a to be stored, just as in the singl2 quantile 
algorithm of Section IITI.D.2; we may use the same value for 


each of these variables for both the a and the (i-a_) 
J J 


guantiles in this case, however, so we can save Some memory 


storage here. Since we will be applying the maxinun 
transform, we also reguire arrays u, max and min; uf j] will 
contain the size of the sample section considered so far for 


the a quantile, max[j] the largest valve in the section and 
3 


minfj]) the smallest. One final array v will contain the v 
values for the maximum transform for each quantile. Since 
we use a nested method for determining the respective maxima 


here, a v[{[j] value of 2 means that the section for the a. 
7 
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quantile is twice as large as the section for the a 


quantile. 


The values in the a and v arrays must be precomputed 
and will remain fixed throughout the estimation process. 
The remaining arrays must be initialized at the beginning of 
meres algorithm just as in Subsection IV.D.2. In the ALGOL 
description below we suppress the initialization steps as 
they tend to obscure the operation of the method. We give 
an ALGOL~like description both because ALGOL is the standard 
language for setting forth algorithms and also because the 
result is more easily understood than a FORTRAN progran. A 


FORTRAN implementation is given in the Appendix. 


comment This first section carries out the stochastic 
approximation process for the median. 
The algorithn updates the various 
stochastic approximation arrays given 
the single input observation X; 

moe | SiO] - KI; 

micep Uy then £f0}) := £f0] + (bf 0] - t) / bLO ]¢; 


comment upper bound on divisor; 

nh := n(O0] * h{0}; 

if f[0] > nh then d := nh else d := f{0]; 
S0uee= sto) + y / d: 

hfO}] := h{OjJ + 1 / nl OJ; 

REO s= nf Oj] + 1; 

blo} : “4 seh o43 ) * bf O07; 


comment here we pass the X values one at a time outwards 


i 


it 


to the other quantiles; 


max{ 1] := X; min{ 1} 
“| eG ee 1 
while j < top do 
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begin 
comment first we update the current max and min values; 
if j > 1 then 
begin 
1f uf[j] = 0 then 
begin 
max{( jj] := max{j-1); Mong |e UA aoe 
end 
else 
begin 
1£f max{ jj) < nax[j-1] then max{j] := max{ j-1]; 
1£ minfj}] > minf{j-1]} then min{j]} := nin{ j-1] 
end 
end; 
ieee = UL) + 1; 
comment determine if the current section is complete; 


if u{jj] # v£j] then j := top + 1 else 


comment this section is for the alpha[j] quantile; 
© s= |sik] = maxi v4; 

Peer Dig thenvttk |) t= £fk] + (bf 7 ]=t) Jbl j Ie; 

if max{j] $ s[k] then y := a{jj-1 else y := a{j]; 
Miss EET) * hl os 

1f f{k] > nh then d := nh else d := fF[K]; 

Stic s[k] + y / da; 


copment this section is for the (1t-alphaj Jj J) 
quantile; 
t := Is{k+1] - minf{ j]{; 
if t < bij] then f££k+1] := f[k+1] + 
(oC 3 J-t) Zo 5 J¢; 
if minfj] < s{k+t1] then y := -a{jJ 
else y := 1 - a{jl; 
Mime de) > thi then d@ := nh else d := £[k+1]; 
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S[k+1] := s{k+1]) + y / d; 


comment here we update the constants for the 


alphal{j] quantile; 


Bie nj) + 1 7 nf 5); ees atl ee 
Olahie. = v= 5 3 55 00 6) (ag lad fr 

Cj] ( sat 43 ) Bo 

fesse opt 1; Ko t= oka f-2 

end 


end; 


The introduction of an initialization section makes the 
algorithm somewhat more complex, but even greater difficulty 
ensues when we combine all the arrays into a single data 
Structure (which is also an array) as we do in the FORTRAN 
Subroutine QUANT which is listed in the Appendix. This use 
of a Single array has the advantage, however, that we may 
now accumulate quantile estimates on several different 
random variables as long aS each one is allocated its own 


estimation array. 


We may incorporate the next-to-maximum transform into 
this scheme by adding yet another array nextmax to our 
algorithm (or an extra set of memory locations into the 
Single array es has been done in QUANT, for example.) The 


section maximun update steps in the algorithm now become 


if max{j-1] > nextmax[j] then 

if max{j-1] > max{j] then 
begin 
nextmax{ jj := max[j]; 
max{j]) := max{j-1]; 
if nextmax{ j~1] > nextmax{j] then 

nextmax{ 7] := nextmax{j-1] 

end 

else nextmax[ Jj] := max[j-1]; 
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A second ¢rray nextmin with a Similar update sequence will 
also be required for the lower guantiles. The stochastic 
approximation operations will then be carried out using the 


values in nextmax{ j] and nextminf j]. 


Either version of the joint estimation algorithm 
requires that we have available fairly large samples of 
data. In order to obtain varaince estimates for the most 
extreme quantiles we needa minimum of 4v+3 observations, 
i.e. a total of 2691 for the maximum transform design of 
Table I and 6147 for the next-to-maximum transform design. 
This emphasizes the point that stochastic approximation 


quantile estimation is a large sample technigue. 


B. Reordering Techniques 


The first discrepancy noted when using Monte Carlo 
methods to investigate the performance of the algorithn of 
Section A is that the resulting quantile estimates are 
sometimes not in the proper order. In what follows, we 
assume that we are to estimate the a(1),a(2),-...,a (mM) 


quantiles, where a({i) < a(j) for i< j. Since s Satisfies 
a 
F(s ) = a and since every distribution function F(¢) is 


monotone, we must have 


(1) SS eer EOe a (1) <a ()) 
a (1) a (3) 


with strict inequality when F(e) is continuous. In any 


Gevent, if the joint estimates 5  (n) > S |. (n}) result 
a (1) a (9) 


‘from a sample ace from the parent population ve 
n 


ae 
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& 


Clearly have an error for which we should make some 


adjustment. 


This adjustment may be made only after the final set of 
estimates is obtained or it may be carried out dynamically 
throughout the estimation process whenever any of the set of 
quantile estimates violates the relation (1). It turns out 
that the dynamic readjustment of the estimates can 
materially improve the overall precision of the final 
estimates, where we adopt as a measure of this precision the 
total squared error of the set of m quantile estimates, i.e. 


rs n) - 2 


. (n) S ] 
L a (j) a (J) 


K3 
1 
li({48 


mn 5 


The expected value of T is just the sum of the nean 
mn 
Squared errors of the individual quantile estimates. None 


of the readjustment processes considered here changes the 


asymptotic distribution of s§ Since none of them will be 
n 


used if the set of quantile estimates satisfies (1); the 


almost sure convergence of 5§& implies that the order 
n 


relationship (1) will hold almost surely for any sequence 


of joint estimates. A reduction in the value of E[T ] thus 
mn 
represents a decrease in the bias of the ind iy rdiwe 1 


estimates 5 (n) rcather than a change in the asymptotic 
aa} 


lee 
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variance. 


One way to reduce the expected value of T for m 2 3 
nn 
is to employ the James-Stein estimation process (for an 


explanantion with examples see Efron and Morris [8]})). 


Briefly, the idea is to decrease the value of each §& (n) 


a (3) 


Slightly [the amount depends on the actual variance of 


Ss ae so aS to move the estimate closer to the surface 
acy 
Ot the m-dimenSional hypersphere on which the point 
S iene S lies. 
Beat) a(n)’ 
The set {fs pas } of quantile estinates does 


S 
a (1) a {m) 
not exactly satisfy the requirements for the James-Stein 


adjustment since we do not know the precise theoretical 
variances. Furthermore, although some perturbation of the 
order of the set of estimates occurs, the adjusted set does 
not in general satisfy (1). The James-Stein technique was 
applied dynamically (using estimated variances) during the 
Stochastic approximation joint quantile estimation procedure 
and it was found to make the properties of the extreme 
quantile estimates materially worse. We thus reject this 


method of adjustment. 
The most straightforward of the methods that has been 


found to reduce the expected value of T in some cases 1S 
we 91 
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Simply to adjust any of the 3  (n) values which fall 
J 


a (9) 


outside of the interval ae J back to the nearest 
| n 


boundary of the interval. It is quite easy to keep track of 


the sample extrema Ray and i, Since the process requires 
n 


only two additional memory cells; the subprogram QUANT in 


~ 


the Appendix was designed with this capability. 


From (3.1), the probability that the sample range 


[ xX a j covers the a-quantile is just 


(1) (n) 


3) too 


iA 
P< 
I 


Pr{x SS } 
(1) - (n) 


n n 
Se ee CN = A) 


thus the adjustment is more likely to reduce the bias of 5 
n 


as the sample size increases. Since the initial estimate is 


| Views 
based on a Sample of size 3 v, where a = 0.5, the 


probability that the interval for the first maximnun 


transformed estimate =. contains S iS approximately 0.875; 
a 


this follows because the interval is [xX re } and 
(ey fv) 
Ve ot _ 

a =mumOowon Ine pEobabilitcy that the interval for  s'* 

n 

ee n+ 1 yy 
contains s 1S Similarly 1 -— ¢ 5 ] , Which rapidly 
a 


approaches 7.0. 
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-For-. reasons of practical utility we prefer to carry 
this so-called extremum adjustment (as well as the other 


adjustment methods) only when the value of the most extreme 


quantile estimate § Cm) changes; in the maximum transformed 
a (nm 


@aoe this wall occur for each v[m]} observations on X. This 


not only decreases the amount of time devoted to the 
adjustment process but it also can be done very conveniently 
in the aligorithn. In subroutine QUANT, the call to 
subroutine CHECK near the end of the quantile estimation 


loop is an invocation of the order adjustment method. 


AS can be seen in Table IX the extremum adjustment 
apparently helps slightly in the small sample exponential 
case but there seems to be very little basis for adopting 
this method in general. Furthermore, the extremum 


adjustment will have no effect on violations of (1) unless 


both quantile estimates lie outside a at ) J in the sane 
n 


direction. We thus seek a general technique for dealing 


With estimates which are in reverse order. 


Such a technique arises from considering the problem of 


estimating the means Pe and a of two independent normal 

random variables xX _ and Ks with respective known variances 
1 

o2 and of. If we have a Single pair of realizations 2s 


and Rae the maximum likelihood estimators B and ue arise 


Nes 
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from minimizing the quadratic form 


2? a 2 
Q(@ ,) = 2 ah i : 
ae, pe 
7 
feeeeresult is clearly pnp. = x , i=1,2. If we know a_ priori, 
a i 


however, that Z, > Bs and 1t happens that - < oe we must 


solve the guadratic programming problen 


min Om t.) 


Subject to 


i 2 ii 


71 2 


in order to obtain the maximum likelihood estimators. The 


required minimum occurs at 


= fi = 1 a 
f Bo = 4-1-2. 2 


Note that this is just a weighted average of the x ‘'s, the 


i 
foeomesebeing Chosen aS 1/07; in a sense, the weight w for 
i i 
x is just a measure of the precision of x. aS an eStimate 
a 1 
of np. 
a 


The foregoing discussion iS an example of so-called 
"isotonic"' regression techniques (the term isotonic means 
"order preserving"). These techniques are applicable in 


Situations far more complex than our present Simple 
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requirement that 5 < os a ee : for more 
a (1) a (2) a (m) 


sophisticated applications as well as a summary of the basic 
theory see Barlow et al [1}. The isotonic adjustment 


technique for the situation where S |. (n) > S|. (n) is 
a (1) a (1+ 1) 


then to use as estimates for both quantiles the same value, 


nanely 


wos (n) + w S n) 


S 
ee tel Get: 
¥ + W s 
1 1+1 
the weights used here are just the reciprocals of the 


estimated variances, i.e. 


n Be 


i. "37 awrets"attr 


where B is the density estimate for f(s fy The value of 
n a (i 


nmin (2) may change with a(i) depending on the maxinum 


transform scheme used, 1f any. 


The main complication here is that the entire set of no 
quantiles must be ordered rather than just adjacent pairs of 


—_ 


<so after 
a(it2) a(1+1) 


estimates; thus, if it is found that §s 


the adjustment of the previous paragraph is made it will be 
necessary to set all three of the estimates to the sane 


value which is now 


(n) 


— 2 ae eee een 6? eee woe Gee ew eee eee eee ed ee ee oy ee ee ee eee ——> OE ane —_ —» ea 
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Memenave now created a block {s  ,sS . 5S } of 
aj aes 1) a (2 + 2) 


estimates whose values are egual;: if this constant value is 


not in the proper order with respect to some other adjacent 
block of estimates it will then become necessary to coalesce 


the two blocks together in the same fashion. 


An algorithm for manipulating the blocks in this manner 
was developed by Kruskal [20]; it is also given by Barlow et 
al [1]. This so-called "up-and-down blocks" algorithm has 
also been implemented using the weights (2) for the data 
Structure used by the QUANT subroutine. The resulting 
FORTRAN program is called CHECK and is listed in the 
Appendix. 


A possible extensicn to the isotonic adjustment is to 


adjust the density estimates B at the same time that the 
n 


quantile estimates are adjusted. There is of course no 


reason to suppose that the densities will also be in order, 
but it seems reasonable that if all the quantile estimates 
in a block have the same value that all the corresponding 
density estimates should also be constant. This may be 


accomplished using the same weights as used for updating the 


S . values. Alternatively, we may adjust each B- so that 
a (1) n 


the estimated variance calculated by (3.8) 15 (Os each 
eStimator in the block is the same. Recalling that we chose 


Ww = 1/02, the variance of the block average in block b is 
a af 


given by 
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1 
The adjusted density estimates are then given by 
3 Be = atij){ t-af{iy ] VY , 
(3) i) fo Aaa Yd . 


where n(i) is the sample size for the a(i)-quantile. This 
second scheme was in fact investigated; Monte Carlo resuits 
for both the isotonic adjustment technigue and the isotonic 


technique with density modification are given in fable IX. 


Re 1s apparent from Table IX that the isotonic 
adjustment method greatly improves the expected Ora 
Squared error of the set of quantile estimates. The 
decrease is over 50 &% for both the normal and exponential 
cases. The density adjustment, however, does not improve 


E(T j nearly as much if, indeed, it improves it at all. 
mn 


One difficulty encountered in using the isotonic 
adjustment technique is that if one of the extreme quantile 


estimates (say a Nae) is out of order with respect to an 


SS ) then 


eStimate on the other extreme (e.g., 5S 
Qe IS 0.05 
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Adjustment Norcmal Distribution Exponential Distribution 
Method 
. n=6720 n=67,200 n=6720 n=67,200 
Unmodified 1.9348 0.0444 nce 23 G25 722 
(0.5396) (0.0328) (1.7157) (0.4473) 
Janes-Stein eo Wey 6.7694 x % 
(0.0561) (0.0607) 
Extrena 1.9347 0.0444 TER SING Om 5 123 
(025396) (0.0328) Cis 7 (5:0) (0.4473) 
Isotonic Of 262 O20105 Sime y) 0.2131 
(Om0 3/612) (0.0007) (h- 0650) (0.1332) 
Isotonic 25109 0.0492 6.2388 * 
(Density) (0.4205) (O50 732) (125262) 
Limited i 1S) (teks: Om0197 2.0328 0.1343 
Reorder (0.5766) (0270075) (1.0749) (0.0501) 
Limited Tat ogs O06 2 * xe 
Reorder (0.4939) (0.0426) 
(Density) 
Table IX. Mean of the total squared error T for the mn = 


an 


19 guantiles of Table I using various reordering methods’ to 


adjust for estimates which are out of order. Values are 


the mean of 100 replications of each T statistic; numbers 


mn 


in parentheses are the estimated standard deviations of the 


given estimates of E[T ]}. Asterisks (*} denote experiments 
mn 


that were not conducted. 
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all of the intervening quantile estimates will be set to the 
same value even though they may be close to their correct 
values. The extent to which this may occur depends on the 
parent population but it is likely to be a problem since the 
extreme quantile estimates will be the most variable, 


especially for moderate sample sizes. 


One way to overcome this difficulty is to use a 
"limited" reorder scheme in which each estimate is checked 
With respect to those immediately adjacent. If it is found, 


for example, that 


< s 
a (i) a (1-1) 


but that 


—= = 


Ss. Ss. ; 
a (i-1) a (i+1) 
then we discard the old estimate § (iy and set 
a (i 


— on 


= ae aay CED = ae aes ee =e Ep ge © o> OSD Sew Se =~ =p a 


a (i) 
Ww + Ww. 
T= 1 1+1 
If the estimates S_. and Ss ., are also out of order 
a (1-1) a(itt) 
we merely Grainy, out the usual isotonic regression 


adjustment. 


The limited reorder adjustment may also be applied with 
the density adjustment (3) used in the isotonic case. The 
results from Table IX indicate that this method shows some 
promise but it does not appear to be generally as good as 


PiommErcOtonic Case. Once again, the density adjustment does 
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not seem to be useful. 


The results in Table IX show a substantial reduction in 


E{T jwhen we adopt the isotonic adjustment; as mentioned 
mn 


previously, this iS an indication of a reduction in the bias 


of the &s ry S. it 1S possible that this bias reduction 
a (i 


Will now make the stochastic approximation estimators more 
competitive with order statistic estimators. Direct 
computation shows for the order statistic case, however, 
menac the total squared error for a sample of 6720 
exponential variates is 0.2907 and it is 0.0285 for 67,200 
Observations. Thus, a better reordering method is needed to 
obtain comparable bias results. Even though it is possible 
that the stochastic approximation estimators can be ‘further 
improved, we will be unable to improve the order statistic 
estimators any further in this way since none of the 


reordering methods are applicable in this case. 


Our conclusion then is that the isotonic adjustment is 
a robust and flexible method for reducing the expected total 
Squared error of a set of stochaStic approximation quantile 
estimates and that simultaneous adjustment of the step size 
parameters is not indicated. The limited reorder adjustment 
may be better in some applications; more work could be done 


mn this area. 
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Chapter VI. PUNGTVONS SOF SUANTILES 


In this Chapter we investigate the question of whether 
our methods can be adapted to the joint estimation of an 
unknown quantile and some random function of that quantile. 
Of course, one case in which we already know that this can 


Memmoome 91S the estimation of @= f(s ) using a kernel 
a 
estimator since this density estimate is used directly in 


the quantile estimation process. We first determine what 
kinds of functions we may use in this joint estimation 
procedure and we then give an example which is of practical 


Mmeenin statistical simulation studies. 


A. Sufficient Conditions for Convergence 


Given a sample Nigar from a papulation With 
n 


distribution function F(e) satsifying (Fi) and (¥F2) we 
obtain the corresponding a~quantile estimates 


3 0S aaa a a At each stage of the process we also have a 
n+ 


random vector Y (possibly empty) which we use to compute 
n 
the value of the known function P (S ,X ,¥ ); we are then 
n n nen 


interested in the properties of 


(1) oe a P (S_,X_-Y_) - 
1 1 al 1 
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Of course more general formulations involving several 
previous X or S values are possible but since our emphasis 
n n 
throughout this work is on methcds which conserve storage we 
limit ourselves to the formulation (1). 
We approach (1) in the same way as we proved Theorem 2 


in Chapter II; first, however, we must redefine the sequence 


meeeo-tields B = O(S 3X ,~2e0o,X gr rerre f ) to include 
n 1 71 i | 1 nS 


the ¥ variables. Then we write 


(2) Sess) S02 (Sk, ht alls. 
n oon nonin a n 


Expanding (1) we have 


(3) Da oe nes een ee et) 
n 1 de Sire eae 1 


fnew Gibrst term in (3) will approach 0 almost surely 


according to Lemma 2 if we have 
(4) Var{l P (sere 7Y 1 = O(n), 
em al 
se —_ ° 
Since then > n-2 Var{[P (S ,X ,¥ }] will converge. 
n=1 nnn on 
The second term in (3) will converge a.s. according to 


Lemma 5S as long as t (e) is measurable and uniformly 
n 


eontinuous for every n 2 N; in this case we have 
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ey (se) ~-> t(s ) and so 
h a a 


t (S ) --> t(s ) a.s. 
he eit a 


In view of Lemma 6 we have thus oroved: 


Theorem 5 As long as P (S ,X ,Y ) Satisfies (4) and t (e) 
nn oni a n 
given by (2) is measurable and uniformly continuous then 
Pear eS | die Sey 
n a 


where p is given by (1). 
n 


B. Applications 


In a statistical Simulation study we mav generate 
sufficient pseudo~randon Samples OFX to obtain a 
n 


Satisfactory estimate S of the a-quantile and then repeat 
n 
the experiment and compute p uSing the final quantile 
n 


estimate value, 1.e. we calculate 


n 
Pe 2S X,Y) . 
1=1 


p' = 
n 1 nett 2 oe 


a— 


This value should have a lower bias than p (at, least in the 
n 


first few terms) Since it 1S based on a more correct 
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estimate of s . We may also use the p' estimate with a 
a n 


fixed data sample which is ctecorded on a storage mediun 


which allows re-examination of the data, e.g. magnetic tape. 


If the X values are difficult to generate, however, it 
may become prohibitively expensive to repeat the entire 
experiment from the beginning to take advantage of the 


presumably lower bias of p'. It may also he impossible to 
n 
repeat the early xX values if the source of the data is a 


real-time system of some sort, for example. In these cases 


we prefer to use the dynamic estimate p in order to 
n 


conserve memory storage requirements. 


The basic application envisioned for this technique is 
the estimation of empirical distribution functions and 
percentiles (see the next Section). It may also be used for 
estimating density values from other distributions, i.e. ve 
take 


= Sea 
P (s ,Y¥) = _1. 4 [ a ew | : 
noni oon b 9) 
n n 
Pyerdenciy then p <--> cae ) in this case as long as the 
n a 


distribution function F (e) of the Y population satsifies 
Y 
(F1) (see Lemma 7). This same method may be readily 


extended to the estimation of joint density functions. 
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cs Power and Level of a Test 


AS an application of our method we consider the 
statistical simulation problem of estimating the level of a 
statistical test and then determining the power of the test 
against various alternative hypotheses at the chosen level. 
Suppose, then, that we have a simple hypothesis a and a 
finite set of simple alternate hypotheses (tie cio pk Zoe rire 

| m 
test statistic fT is proposed for testing Ha the (unknown) 
Seouetoution of T under H  .m111 be denoted by F (¢#), 
J J 
jJ=0,1,-4e,m. We assume that Be Satsifies (F1) and (F2) 


macmtiatc Cach Of the Fae(*#), j=t,-.--,M, satisfies (F1). 
J 


We wish to determine a level T for the test statistic 

a 
T such that the probability of a Type I error in testing a 
will be a. ASsuMming that the test region is T ¢T, the 


ot 


test level is the solution to 
Pr{Tl Ss T (jHeje= 1° a; 
a 0 


Or 


PeeGe ) <= Wer 
os a : 


ie. yr is the 1 - a quantile of Es Tt 26 
a 


straightforward to extend this to other test regions. 


Realizations of the statistic T are now obtained by 
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On 


0 0° 
sampling sets Ie he of k values each From a 


population satisfying ine in the Simulation context, these 

samples are generated by a pseudo-random number generator. 
QO | 0 

The value T is then computed from the nth {X } sample and 
n 

may be used to obtain a new stochastic approximation 

estimator of T using the algorithm of Chapter III (or 

a 


Chapter V if several different values of a are of interest). 


We denote this nth sequential estimate of T by fT. 
a n 


0 
Now suppose that in addition to the {X } sample we have 


Samples <irOm populations satsifying HH , jJ=ty.-+eM; we 
denote such a sample by pone Note that it may be very easy 


0 
to generate such samples given the basic {X } sample; 1f, 

0 
for example, the null hypothesis involves E[X ] = 0 while H. 
J 

. y 2 
requires E(X J = p. #0 then each {X } sample may be 
J 
. 0 

generated by adding an appropriate constant to {X }. From 


each x7} Sample, then, we compute the statistic I[, denoting 


the nth realization by r. 
n 


188 





Non-parametric Quantile Estimation Through 
Stochastic Approximation 


The power of the test based on T is just the 


probability that under Ze the statistic —T— fails the test, 


ae Cs 


i 


(5) P Beare F { Ho} 


J 


1=F (TP )g- 
fa a 


Note that the power defined by (5) 1S one minus the 


— -percentile of F (e). According to Theorem 5 we may then 
a J 


: y) 
use aS an eStimate of p 


_J n 5 
2) i ray ie san ee ae 
n Ne b= dw 
as long as p(T ,T>) has the correct properties. im Lace, 
gag els a 
if we choose 
: Oat r? Se 
5 y 1 
(6) PeGieg, L.)) = 
1° 2 5 
le. ees 
1 1 


then we have 
5 
Vane, (le) Sal Stan 
i Bee q 
and 
5 
(7) Riot, te pelea fo F (1 ) ass. 
1 2: a Pol 


Now (F171) guarantees that F (*) will be continuous in some 
J 
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Grtosed neighborhood of s and since 7? (*) is bounded it will 
a J 
be uniformly continuous there. Thus, 


=o 
n ja 


0 ; 
Note that (7) does not reguire that T. and r? be 
7 x 


0 
independent; in fact if we are able to use the {X } sample 


to generate (x they will certainly not be. A degree of 


Ld 


0 : 
positive correlation between T. and T , moreover, may 
a 


tH Oo 


actually improve the estimate ae ee ak 


is large then Tf. 
n i 


Will also be large; however, r is also large in this case 
at 
so that the tendency will be for (6) to add an appropriate 


value to oe 
n 


Since ve are usually interested in very small 
probabilities of Type I error, we will generally have the 
probability of error, a, very small. Hence, it will most 


often be necessary to use the maximum transform to estimate 


: 3 
ite In this case we continue to accumulate P (T_,T_) terns 
a Lt 2 & 


even though the value of T. has not changed since the 
as 


previous step. This does not change the analysis to any 


great extent; we are merely adding a binomial random 


Vamtebre to the Sum instead of a Bernoulli as before. 
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It 1S not hard to show, using Lemma 11 and following 


the lines of the proof of Theorem 4, that oh has an 
n 


asymptotically normal distribution. In fact, 


CM eee! a9 5 5 
(8) - Sa Soe Dc, el eee aE } . 


Some empirical investigation of this method has been 
carried out using the FORTRAN subprogram POWER listed in the 
Appendix. The example chosen was the estimation of the 


power of the t-test. The statistic is 


=e we we ee =o 


where Z iS a zero-mean normal random variable, da constant 


and s an independent estimate of Var{z] based on n degrees 
Zi 


cf freedon. When dis zero t has Student's t-distribution 
n 


with n degrees of freedom while t has a non-central 
n 


t-distribution when d # 0. 


The quantiles of both the central and non-central 
t-distributions may be readily approximated so that the 
results of the joint estimation procedure can be checked. 


The null hypothesis is HH : d= 9 while the alternate 


hypotheses are H : d = d_# 0. Because of the tine required 
J d 

to carry out the simulation no attempt was made to determine 

the order of the bias or to verify the asymptotic 

distribution (8); the results for several different n 


values, however, were in good agreement with theory. 
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Chaptes- VII. SUMMARY AND CONCLUSIONS 


A. Main Results 


The main contribution of this research is the 
development of a practical sequential quantile estimation 
method which can be applied even for extreme guantiles. 
Both the asymptotic and finite sample properties of this new 
method have been shown to be comparable to those of the 
order statistic nethod which is the most commonly used 
non-parametric technique for estimating quantiles; the new 
method requires only a small, fixed amount of memory for its 
implementation, however, and is thus superior to the order 


statistic estimator for large samples of data. 


Monte Carlo experience with the new estimation method 
Shows that it is guite robust with respect to the underlying 
distribution of the random variable whose quantile is to be 
estimated. Use of the maximum transform of Goodman, Lewis 
and Robbins [({14] allows the method to be applied even for 
extreme quantiles without the grossly unstable finite sample 
behavior which has characterized most attempts at stochastic 
approximation quantile estimation; see, for example, 
Wetherhill [36], Cochran and Davis [4] or Iglehart [{ 16}. 
Since the method also provides an estimate of the variance 
of the quantile estimate, confidence intervals on the 
quantile may be computed. This iS a sine gua non of good 
Simulation practice. The technique thus qualifies aS a 
flexible building block for use in data analysis Ot 
Simulation computer programs. Because of the modest memory 
requirements it may be used in such programs for dealing 


With more than a single random variable. 


Us le4 
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Extension of the scheme to the estimation of a set of 
quantiles allows further improvement of the results by 
taking advantage of the known order relations in the set of 
quantiles; the resulting reduction in the bias may be 
Substantial. Furthermore an entire set of quantiles such as 
the 19 values considered in this thesis provides .an 
excellent characterization of the distribution of a randon 
variable X; this information may be much more meaningful 
than just the moments of X, especially for highly skewed or 


outlier~prone data. 


The development of a technique for the simultaneous 
estimation of both the level and power of a statistical test 
is also a useful contribution. When carrying out such 
Statistical estimation experiments Monte Carlo methods are 
generally applied for a wide range of test sample sizes. 
The overall savings can be substantial since use of the 
Simultaneous eStimation method results in a saving for each 


test investigated, 


All of the algorithms described in this thesis’ have 
been implemented as FORTRAN subroutines; some of these are 
particularly flexible and are listed in the Appendix. 
Subroutine QUANT implements the joint quantile estimation 
algorithm Of Chapter V while subroutine CHECK implements the 
molec adjustment algorithm of Chapter V. Subroutine 
POWER is for the simultaneous level/power estimation 
technique of Section VI.C while QOUT and PWROUT print out 
the estimates accumulated by QUANT and POWER, respectively. 
Specific details of the data structures and a)gorithms 
employed may be found in the comments which accompany the 


subroutine listings. 


Sample output from subroutine QUANT is also included in 


Mmiomimm@endia- the indut data in this case was a pseudorandon 
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Sample fron the exponential distribution. The accuracy of 
the results may be judged by comparing them with the true 
values which are also listed in the Appendix. A sample of 
the application of subroutine POWER is also included; the 
input was the t-test experiment data described in Section 
Ver. C. Once again the true values are also listed for 


comparison. 


B. Proposed Applications 


As has been mentioned several times, Monte Carlo 
Simulation 1S the primary application envisioned for the 
improved stochastic approximation quantile estimator 
developed in this work. The large samples of data reguired 
to obtain reasonable results from the procedure are easily 
obtained in a Simulation experiment; further, the experiment 
can be designed so that the sequential xX observations are 
independent and have a continuous diStribution. The 
inevitabie development of larger and faster computers will 
make the technigues even more valuable as larger simulation 
experiments become possible. Finally, in Simulation work we 
usually wish to obtain estimates of high precision so that 
the magnitude of the biaS encountered in some order 


Statistic methods is often unsatisfactory. 


Giemwardorlcam Of —Section V.A could profitably he 
employed as.a part of a large-scale Simulation package (even 
though the implementation given in subroutine QUANT is for 
independent use). An example is the COMPSTAT program of 
Lewis [22] which was designed to allow the user to employ 
Monte Carlo methods to investigate statistical distribution 
problems; a large part of COMPSTAT is concerned with 
providing summary data on the statistics generated by the 


user and subroutine QUANT 1s ideal for that purpose. 
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The method is not as readily applicable to more general 
Systems Simulation studies (e.g., queueing problens) because 
sequential observations are often not independent in this 
case. If one is interested in steady-state behavior, 
however, the regenerative Simulation technigues of Iglehart 
{16} can be used to generate independent replications. 
Since these regenerative technigues tend to be fairly 
specific to the problem at hand some care must be exercised 
in using the improved stochastic approximation guantile 


estimator here. 


The question of independence is also an important one 
in deciding whether the new quantile method can be applied 
in a general data analytic role with "real world" data. A 
more important consideration here, though, is whether 
sufficient observations are available; subroutine QUANT, for 
example, requires a minimum of 2691 data points and this 
number will be much larger if the next-to~maximnun transforn 
is used. Given the memory size of modern-day computezcs, 
however, it iS reasonable to accommodate arrays of up to 
5000 observations in core storage; it will then be possible 
to use one of the order statistic methods of Section III.A 
directly on the sample. Since the order statistic 
estimators avoid the maximum transform variance inflation of 
the stochastic approximation estimators they should be used 


when it iS possible to do So. 


Two cases in which enough data will be available are 
real time systems and large data bases; in both cases 
obtaining information for system management is a topic of 
considerable current interest (see Gaver and Lewis [12]). 
In fact, so much data may be available in these instances 
that order statistic estimators cannot be applied because of 
memory restrictions. The modest memory requirement for 


subroutine OUANT would make it ideal for dynamically 
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accumulating data in:sa real time system; for example, 
estimating job execution time parameters in a computer 
Operating system can be done very easily without the 
necessity for saving a complete record of all the job times 


on some external storage medium as is usually done. 


Gaver and Lewis [12] give an example of applying 
stochastic approximation quantile estimation in large data 
bases. They suggest that the next-to-maximum transform be 
used and that sample maxina which deviate too far from the 
quantile estimate be subjected to verification by the 
Original source of the data as an automatic error correction 
device. In this application the density estimates provided 
by the improved method should be useful for deciding just 
when the maximum is "too far" from the guantile estimate. 
When working with data base information, however, care must 
be exercised that the data is sufficiently continuous to 


allow application of stochastic approximation. 


C. Areas for Further Study 


Three general areaS in which more work could profitably 
be done suggest themselves: improving and refining the 
Stochastic approximation quantile estimation procedure given 
here, investigating the performance of the procedure when it 
1s applied to other kinds of data than those considered for 
this thesis and extending the procedure to handle nore 


general kinds of inputs. 


The basic method set forth in Section III.D could 
perhaps be improved if a better kernel function or a_ better 
bandwidth sequence were chosen. There is the danger that a 
combination of density estimation parameters may be nearly 
optimal in one application and yet lack the robustness — 


displayed by the present choices; a practical choice nust 
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also be fairly rapid conputationally. A specific 

combination may be evaluated by uSing the regression methods 
“Vf2 

of Chapter IV to estimate the n bias component ina set 


of independent realizations of s*, n= 1,2,...3 |. some 
n 


investigation of the distributional properties of the 


estimator along the lines of Section IV.E would also be 


indicated. 


The joint estimation method could also possibly be 
improved by a better reordering scheme; the limited reorder 
technique, for example, shows Some promise here. Once again 
a new adjustment method should be fairly robust, not disturb 
the distributional properties of the individual estimators 
and be computationally fast. A more careful comparison with 


the order statistic case Might also be carried out here. 


The data used in the testing of the improved stochastic 
approximation quantile estimator was all from fairly 
well-behaved distributions and the resulting estimates were 
also well-behaved; the performance of the method in the face 
of outlier=contaminated data should also be investigated. 
The idea of Gaver and Lewis [12] for the possible rejection 
of section maxima as outliers based on quantiles estimated 
from the next-to-maxima would be a good place to begin this 
investigation. General use of the method as presented in 
this thesis on real world data might also disclose 
Shortcomings which might be overcome by using other kernel 


functions or by changing the starting values. 


It would also be interesting to determine the effect of 
using the stochastic approximation algorithm on data samples 
from discrete distributions or from an autocorrelated 


PeeGess OL some SOrt. Although convergence in these cases 
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is not guaranteed one has the feeling that the results ought 


not to be too bad for samples which are not too extreme. 


One criticism of the stochastic approximation estimator 
is that it 1s sensitive to the order in which the sample is 
obtained; a determination of the effect of the order of the 
Original sample on the final estimate might disclose how 
robust the procedure 1s in this case. Note that the process 
of reordering a sample may be used to introduce dependencies 


into the data, if desired. 


Finally we turn to extending the theory behind the 
stochastic approximation method to include X samples from 
populations more general than those allowed in Chapter II, 
e.g. those with weak dependencies of some sort or those 
which are discrete. Almost nothing has appeared in the 
literature on these questions but weakening some of the 
assumptions of Chapter II would provide a powerful extension 


to the method presented here. 
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